| |
(Last Update: 9/1/04) A Sample Log Entry (extended format; Notes Domino server)
64.12.102.33 www.esc.edu - [16/dec/2003:00:04:09 -0500] "get /esconline/online2.nsf/eschome?openform http/1.1" 200 21691 "http://www.google.com/search?hl=en&ie=iso-8859-1&q=empire+state+college" "mozilla/4.0 (compatible; msie 5.0; aol 8.0; windows 98; digext)" 203 "" "e:/lotus/domino/data/esconline/online2.nsf"
The Same Entry, Broken into Fields
| Field Name | Sample Entry |
| remotehost: | 64.12.102.33 |
| rfc931: | www.esc.edu |
| authuser: | - |
| date: | [16/dec/2003:00:04:09 -0500] |
| request: | "get /esconline/online2.nsf/eschome?openform http/1.1" |
| status: | 200 |
| bytes: | 21691 |
| referrer: | "http://www.google.com/search?hl=en&ie=iso-8859-1&q=empire+state+college" |
| user-agent: | "mozilla/4.0 (compatible; msie 5.0; aol 8.0; windows 98; digext)" |
| processing time: | 203 |
| cookies: | "" |
| translated URL: | "e:/lotus/domino/data/esconline/online2.nsf" |
Definitions
- Authuser
- The username with which the user has logged in (only recorded if
the person has hit a page that required a login; if the person didn't login,
"-" is recorded)
- bytes
- The size of the file served to the user.
- Clickstream (a.k.a. clickpath)
- The sequence of pages and activities that a user takes through a site, as he/she selects one link after another.
- Common Log File Format
- The standard set of data recorded for each hit to an HTTP server. Data fields are recorded in the following order for each hit, with each field separated by a character space:
remotehost rfc931 authuser [date] "request" status bytes
- Cookies
- In the server log, the name of the cookie used; if there were no cookies, "" is logged.
- Date
- The date and time of the request.
- Domain (a.k.a. domain name)
- The text name that corresponds to the IP address of the remotehost. Not all IP address have a domain name.
- Domain Name Lookup (DNL)
- The process of looking up an IP address and getting its domain name.
- Entrance Page
- The URL of the page on which a user session begins.
- Exit Page
- The URL of the page on which a user session ends.
- Extended Log File Format
- Expands the common log format to include addtional fields. Data fields are recorded in the following order for each hit, with each field separated by a character space:
remotehost rfc931 authuser [date] "request" status bytes "referrer" "user-agent"
May also include other data, depending on the server and its settings.
- Hit
- One request from a web browser for one file on a web server. The requests can be for every kind of file on your web site, including html, graphics, style sheets, audio files, scripts and dynamically generated pages (.cgi, .asp, .pl, .js, etc.). Web pages are often a combination of text, graphics, etc., so a request for one web page may be recorded as a series of several hits in the server log – one hit for each file requested.
- Metric
- A measurement of a characteristic of an object or activity. The measurement is done using a consistent method, at consistent intervals, in order to assess, monitor and/or communicate information about the object or activity. Metrics may be quantitative or qualitative.
- Page
- The files that are considered the actual documents of the web site, including forms and dynamically generated information. In other words, not the graphics, style sheets and the other files that go into the documents. Pages are generally html files, and, depending on the server, may also be .cgi, .asp, .pl, etc..
- Page view (a.k.a. pageview)
- A hit to a file that is considered a page.
- Processing Time
- The time in milliseconds to process the request.
- Referrer (a.k.a. referral and http_referrer)
- The URL of the previous page, if the user clicked on a link on that page to request the page recorded as the "request". If the user requested the page by typing the URL directly into the browser’s “location” box, or selected it from a “Favorites” or “Bookmarks” list, this field is blank or a "-" is recorded.
- Remotehost
- The numeric IP address of the computer used to access the web site. Most IP addresses can be translated into domain names (for example, “cache-mtc-af03-proxy.aol.com”), which can indicate how the person is accessing the site, from what kind of organization, from what country. If the person accesses the site via an Internet Service Provider, the address will be for a computer on that service provider’s network, and not the specific person’s computer. Neither the IP address nor the domain name indicates the person’s actual identity, or even if it's the same person all the time.
- Request
- In the server log, the request line for a particular URL, exactly as it came from the user’s browser.
Typically starts with "Get" or "Post"; ends with "http/1.0" or "http/1.1".
- rfc931
- The remote logname of the user.
- Server Log
- A file in which is recorded all the requests for files made to a specific server and the result of that request.
- Session (a.k.a. user session or visit)
- All the requests made by a unique user with a chosen time limit. A session is determined by calculating the time between one request from a remotehost and the next request by the exact same remotehost. If the time is greater than the chosen limit, or if that specific remotehost has never made a request before, then a new session has begun. If the time between one request and the next is less than the limit, the request is part of the same session. The time (or timeout) limit can be any length. If the limit is 30 minutes, and a remotehost makes a request at 1:00 p.m., and the next request from that remotehost comes at 3:45 p.m., the requests would be counted as two separate sessions.
- Site
- Another name for the remotehost. NOT equivalent to unique visitors.
- Status (a.k.a. return code)
- The HTTP status code for the request made to the server. Codes in the "200" and "300"
range indicate that the request was fulfilled successfully; "400" and
"500" indicate some kind of error.
- Unique Visitors (a.k.a. unique users)
- The number of individual users who visit the site. Based on requests by unique IP addresses.
- User-agent
- The browser and platform used to make a request to the server (including spiders and web bots)
- visit
- See "session"
- Web Metrics
- Metrics used to assess and monitor activity on a web site, usually to study how well the site meets its objectives. Sometimes also called "web analytics," although some writers make distinctions between metrics and analytics. See also "metric."
- Web Traffic
- The amount of activity and a website; usually measured in visitors or page views.
Server Log Analysis Software and/or Services
NOTE: The products and/or services listed below are by no means the only choices available. They are just some of the more widely used ones -- no endorsement or recommendation is intended. In addition to information about the software product, their sites often contain helpful FAQs and articles about web server statistics and other topics related to web metrics.
Helpful Books, Articles, etc.
Goldwyn, Craig. "Understanding Web Log Statistics and Metrics," http://visibility.tv/tips/stats.html.
Inan, Hurol. Measuring the Success of Your Website, Longman, 2002.
Companion website: http://www.hurolinan.com/book/default.asp.
Peterson, Eric, Web Analytics Demystified: A Marketer's Guide to Understanding How Your Web Site Affects Your Business, Celilo Group Media, 2004.
Companion website (includes online discussion and blog): http://www.webanalyticsdemystified.com/
Poon, Alex , Pearce, Ben, Comber, Peter, Fletcher, Peter , Practical Web Traffic Analysis: Standards, Privacy, Techniques, and Results, APress, 2003.
Sterne, Jim. Web Metrics: Proven Methods for Measuring Web Site Success, Wiley Publishing Inc., 2002.
--, "Web Metrics Versus Web Analytics," http://www.marketingprofs.com/4/sterne14.asp, March 16, 2004.
|