| In This Document: |
Third-Party Resources:
|
WebQuilt is a tool for logging and visualizing web traces. Intended as a tool for remote usability evaluations, WebQuilt allows you to capture usage traces (even for sites you don't own), aggregate them together, and visualize the patterns of usage. This documentation is for the WebQuilt Proxy, the application which performs the capturing of web usage data. The proxy utilizes Java Servlet and JSP technology to track users' interaction with the Internet and then store that data by (1) creating a log file of each user's web use and (2) additionally caching the pages a user accesses for later viewing. This data can then be used by the WebQuilt visualization system (a separate component) to visualize and explore this usage data.
Since the WebQuilt Proxy is actually a Java Servlet, it requires a Servlet engine to run. While theoretically this can be done with any viable Servlet and JSP engine (e.g. IBM WebSphere, Apache JServ) the proxy has only been tested on Tomcat, a free implementation available from the Jakarta project, a Java-specific subdivision of the Apache project. Accordingly, this WebQuilt Distribution comes bundled with Tomcat 3.3.1 to allow for easy setup and use. Below are instructions for completing the proxy installation. Following that, there are instructions for configuring and running the proxy, and an explanation of the log file format the proxy uses.
The distribution zip file should contain the following basic structure:
-- WebQuilt/ => The base WebQuilt directory
|
|-- doc/ => Documentation directory
|
|-- etc/ => Additional files directory
|
|-- logfiles/ => Default logfile directory. WebQuilt user traces stored here.
|
|-- tomcat/ => Tomcat Servlet Engine directory
| |
| |-- bin/ => Tomcat startup/shutdown scripts
| |
| |-- conf/ => Tomcat configuration files (including server.xml)
| |
| |-- doc/ => Tomcat documentation
| |
| |-- lib/ => Necessary .jar files (including jsse.jar, jcert.jar, jnet.jar, servlet.jar)
| |
| |-- logs/ => Tomcat log files
| |
| |-- modules/ => Tomcat modules
| |
| |-- native/ => more Tomcat stuff
| |
| |-- webapps/ => Registered web applications (inc. WebQuilt)
| |
| |-- webquilt/ => WebQuilt proxy web application
| |
| |-- startpages/ => Default start pages for proxy (e.g. index.html)
| |
| |-- tasks/ => Task description files
| |
| |-- testpages/ => Test pages (inc. WML files)
| |
| |-- WEB-INF/ => Contains WebQuilt Proxy classes and configuration
| | |
| | |-- classes/ => The WebQuilt Proxy classes
| | |
| | |-- lib/ => Third party libraries the Proxy uses (and their licenses)
| | |
| | |-- web.xml => The WebQuilt Proxy configuration file
| |
| |-- *.jsp, *.jhtml => Various support files which are part of the Proxy application
|
|-- certificate.{bat,sh} => Scripts to help generate your certificate for secure transactions
|
|-- startup.{bat,sh} => Start up WebQuilt & Tomcat
|
|-- shutdown.{bat,sh} => Shut down WebQuilt & Tomcat
|
|-- webquilt.{bat,sh} => Main WebQuilt driver. Use startup and shutdown instead of this.
|
|-- README
If you're reading this right now, there's a good chance you've already done this step. If not, go to the WebQuilt download page and get the proxy distribution. This distribution includes the Tomcat 3.3.1 JSP/Servlet engine the proxy runs on, as well as the JSSE security extensions.
2. Install the Java 1.3 RuntimeBoth WebQuilt and Tomcat require Java 2 1.3 Development Kit to run. If you don't have this already, you can get it for free from Sun Microsystems here. If you have already have the Java 2 1.3 Development Kit (JDK) installed on your system you can skip this step. From now on, we will write 'JAVA_HOME' to denote the directory where the JDK is installed (for example "c:\jdk1.3" or "/usr/bin/local/jdk1.3").
It is is important that WebQuilt and Tomcat know where you have the JDK 1.3 installed. To facilitate this you can
do one of two things. One is to have a copy of the JDK within the WebQuilt distribution in the folder webquilt\jdk1.3 (equivalently
webquilt/jdk1.3 on Unix). The other, more efficient, method is to set an environment variable JAVA_HOME with the correct value.
For Windows NT/2000
Assume you have the JDK installed in the directory "c:\jdk1.3". To set the JAVA_HOME variable within a command line terminal use
the command
set JAVA_HOME=c:\jdk1.3
To set the variable for the whole system (recommended), right click on the "My Computer" icon and select "Properties". This will bring up a new window. In this window, click the "Advanced" tab, and then
click the "Environment Variables..." button. Another window will now come up. In the section titled "User variables for <yourname>" click the "New..." button. This will cause a dialog to appear. In the "Variable Name" field enter "JAVA_HOME", in the "Variable Value" field enter
your JDK1.3 directory (e.g. "c:\jdk1.3"). Now click "OK" for each of the opened windows.
For UNIX
Assume you have the JDK installed in the directory "/usr/java/jdk1.3". If you are running either csh or tcsh as your shell (if you are unsure type 'ps' on the command line and see if either comes up), you can use the command
setenv JAVA_HOME=/usr/java/jdk1.3
to set the variable. This will set the variable for that particular terminal. To make it permanent for all terminals next time you login,
copy that line into your .cshrc file in your home directory.
If you are instead running a Bourne shell (e.g. bash), the equivalent command is
export JAVA_HOME=/usr/java/jdk1.3
To make this permanent you can copy this line into your .bashrc file in your home directory.
The next step is to enable Tomcat (and therefore WebQuilt) to talk over encrypted channels (e.g. https:// URLs). This is done using the JSSE (Java Secure Socket Extension) package, which has been included in this distribution.
While JSSE is installed, it still needs to be registered with the Java runtime environment. To do this, you need
to open the file JAVA_HOME\jre\lib\security\java.security. Here JAVA_HOME denotes the directory in which you have Java
installed. Find the line that looks like
security.provider.1=sun.security.provider.Sun. After that line(s) of security providers, add the line
security.provider.2=com.sun.net.ssl.internal.ssl.Provider
If there are already 2 providers, you should use the number "3" instead, if there are already 3, use "4", etc. Now save the
updated file.
Now we need to generate a certificate for Tomcat. A certificate is used by Tomcat to authenticate
itself to users when they request secure (https://) documents over the web. Fortunately, Java has pre-defined methods
for creating this for you. The easiest way to do this is to run the "certificate" program we've included in the
distribution. On Windows systems this is "certificte.bat" and on Unix systems it is "certificate.sh". If you prefer to
do it yourself, the equivalent command on the command line is:
keytool -genkey -alias tomcat -keyalg RSA
Then answer the prompts that appear. When asked for a password, enter "changeit". If the keytool application
is not found, go to the directory JAVA_HOME\jre\bin\ and try again, as this is where keytool is located.
Now that the proxy has been installed, we need to configure it before we can start using it. This is done by editing the file "web.xml" in the webquilt\WEB-INF\ directory. This is an XML file containing info about the WebQuilt proxy that Tomcat uses to properly run the application. Included in this file is a number of useful parameters.
The first useful parameter is the "logdir" parameter. This parameter specifies where all the WebQuilt log files
and cached pages are stored on your filesystem. In the web.xml file you should see a block of text that looks like:
<context-param>
<param-name>logdir</param-name>
<param-value>logfiles\</param-value>
<description>
The directory in which to save WebQuilt log files.
</description>
</context-param>
By editing the text in between the "param-value" tags, you can specify the base directory where WebQuilt will
store all the files which keep track of user's interaction on the web. For example, putting "C:\webquilt\logfiles" in
between the "param-value" tags will set that as the location to store the log files.
The next useful parameter is "debug". Setting it to a value of true will cause WebQuilt to run in
debug mode. This will enable a number of options to appear to clients viewing proxied pages - including the ability
to view the current WebQuilt log, an option for users to submit bug reports back to the WebQuilt proxy, and the
capacity to perform synchronized surfing - viewing both proxied and unproxied pages simultaneously, where following
links in the proxied page will cause the corresponding unproxied view to update automatically. Leaving the "debug"
parameter as false will instead instruct the proxy to display options for users to announce completion of
a task or to abandon a task in progress. You can set the "debug" parameter by updating the section of the web.xml file that
look like this:
<context-param>
<param-name>debug</param-name>
<param-value>false</param-value>
<description>
Specifies whether or not to run the proxy in debug mode.
</description>
</context-param>
Changing the text in between the "param-value" tags will update the debug parameter.
Similarly, look for the "startpage" and "taskdir" parameters to update, respectively, the page that first shows up when the proxy is started and the directory for finding task descriptions.
NOTE: If you change any parameter values while running the proxy, this change will not be reflected until you restart (stop and then start) the proxy. For more info about the web.xml file format, please refer to the Tomcat documentation provided by the Jakarta project.
To run the proxy, you simply need to start it up using the provided scripts. To do this, launch the file startup.bat (on MS Windows) or startup.sh (on UNIX) in the top WebQuilt directory. To later shutdown WebQuilt execute shutdown.bat (on Windows) or shutdown.sh (on UNIX).
For WebQuilt and Tomcat to run correctly, you shouldn't have any other web servers running on the same machine (at least on port 80, the default http:// port, or port 443, the default https:// port). Since WebQuilt, uses networking ports 80 and 443, no other programs can be running which use these. If you are not running any web servers on the same machine there shouldn't be any problems. If you are running Tomcat under UNIX, you may need to start Tomcat as root (superuser) to gain access to these ports. If you don't have root access, you will need to contact your system administrator.
You are now ready to start logging web usage!
HTML users need to point their web browsers to the machine running the proxy, and access the file "webquilt/webproxy" (either by name, e.g. "http://tasmania.cs.berkeley.edu/webquilt/webproxy", or by IP address, e.g. "http://128.32.12.128/webquilt/webproxy"). This will cause the WebQuilt start page to appear, from which users can type in another URL and then begin surfing.

You can enter a URL in the dialog box, and logging will begin after you click the "Go!" button. WebQuilt will assign a default taskID of "anon" and a random userID. The radio buttons allow you to select a method to display task descriptions. For devices that support DHTML, there is the option of a floating task box. Otherwise, the description can be tagged onto the bottom of the page, or left out completely.
If you'd like to specify a taskID and userID for a particular session, you need to include these in the initial
connection to the proxy. For example, the URL
http://tasmania.cs.berkeley.edu/webquilt/webproxy?wq_taskid=buy+book&wq_userid=fred01 will specify that user
"fred01" is performing task "buy book". You only need to include these for the first transaction of a particular session.
You can also specify a starting webpage other than the WebQuilt default by including
it in the initial request using the query parameter "wq_replace". For example,
http://tasmania.cs.berkeley.edu/webquilt/webproxy?wq_replace=www.berkeley.edu will start a user on an anonymous task
that begins on the UC Berkeley homepage.
All WebQuilt parameters begin with the "wq_" tag. Adding Survey information to be added...
WebQuilt organizes it's log files based on (a) the task being performed by the user, and (b) a user's id. These two values can be passed in as query string variables when beginning a user session. If none are provided, WebQuilt defaults to a task of "anon", standing for an anonymous task, and uses the internal session ID for the user ID. The session ID is also appended to any specified user IDs to distinguish repeated tasks by the same user.
The root of the WebQuilt logging directory structure is the directory specified by the "logdir" parameter discussed above in the configuration section. From here each task has it's own subdirectory. Each task-specific subdirectory contains both files and directories. The files, which are of the form "taskID-userID.txt", are the actual WebQuilt log files. The directories, which are similarly of the form "taskID-userID", contain the cached web pages - saved copies of the pages the user visited while performing the task. These web pages are renamed by transaction ID, so the first page visited would be 1.html, the second 2.html, and so on.
Logging Format
The following is a sample of a WebQuilt log file, with a header row labeling the fields:
Time From To Parent Code Frame Link Method URL + Query String 54730 0 1 -1 200 -1 -1 GET http://www.berkeley.edu 109743 1 2 -1 200 -1 -1 GET http://search.berkeley.edu/cgi-bin/regsearch.cgi words=EECS+department 122651 2 3 -1 200 -1 20 GET http://www.eecs.berkeley.edu/ 130171 3 4 -1 200 -1 1 GET http://www.cs.berkeley.edu/ 152491 4 5 -1 200 -1 22 GET http://www.cs.berkeley.edu/Students/Classes/ 161672 5 6 -1 200 -1 11 GET http://www-inst.eecs.berkeley.edu/classes-cs.html 166771 6 7 -1 200 -1 19 GET http://www-inst.EECS.Berkeley.EDU/~cs61b/ 175773 7 8 -1 200 -1 4 GET http://java.sun.com/products/jdk/1.2/docs/api/index.html 176197 8 11 8 200 0 -1 GET http://java.sun.com/products/jdk/1.2/docs/api/overview-frame.html 176185 8 9 8 200 2 -1 GET http://java.sun.com/products/jdk/1.2/docs/api/overview-summary.html 176191 8 10 8 200 1 -1 GET http://java.sun.com/products/jdk/1.2/docs/api/allclasses-frame.html 267539 11 12 8 200 2 1633 GET http://java.sun.com/products/jdk/1.2/docs/api/java/awt/event/WindowAdapter.html 351821 4 13 -1 200 -1 16 GET http://www.cs.berkeley.edu/Research/Projects/ 394752 4 14 -1 200 -1 6 GET http://www.cs.berkeley.edu/People/alphabetical.shtml 409864 14 15 -1 200 -1 446 GET http://www.cs.berkeley.edu/~landay/ 422156 15 16 -1 200 -1 11 GET http://guir.cs.berkeley.edu/ 427076 16 17 -1 200 -1 1 GET http://guir.berkeley.edu/projects/ 442390 17 18 -1 200 -1 20 GET http://guir.berkeley.edu/projects/webquilt/Here's what the fields mean:
| Time | The amount of time, in milliseconds, since the start of the user's session. |
| From | The transaction ID of the previous page the user came from. |
| To | The current transaction ID. |
| Parent | The transaction ID of the current page's frame parent, or -1 if none. |
| Code | The HTTP response code. 200 means OK, 404 means page not found. |
| Frame | The frame number of the current page (ie the Nth frame in the parent frameset). -1 if the page is not a frame. |
| Link | The link the user clicked to get to this page (ie the Nth link on the page). This counts both <A> and <AREA> tags. This value is -1 if the page was not reached through a link. |
| Method | The HTTP method used to retrieve the page (e.g. GET or POST). |
| URL | The current URL. |
| Query | The query data sent along with the page request, if any. |