Smartlab Software logo

AWStats: Traffic Report Analysis

AWStats logo Analyzing the data presented on the traffic report. The analysis is broken up by the report headings.

Steps

  1. Acquire AWStats, other programs, and pertinent information
  2. Install AWStats on the 1and1 shared server
  3. Install AWStats on your computer
  4. Set up the AWStats config file
  5. Create a usable log file
  6. Create/update the AWStats database
  7. Create a traffic web page
  8. Create report batch files
  9. Analyze the traffic page
  10. Tasks

Extending AWStats

  1. Extras
  2. Geographic Plug-ins
  3. How To

Help

  1. Glossary
  2. FAQ
  3. Pages not found FAQ
  4. Stop Referrer Spam

AWStats Report Headings

An AWStats traffic report has many great pieces of information contained in it.

Something to consider when analyzing web stats is to take into account any regional, national or international holidays, festivals, religious events, major sporting events and seasonal trends.

Pages not Found

This is the first place I head. It is amazing how many broken links exist on websites. Be sure to visit AWStats: Page not found to see a list of problems and causes.

Note that if you passworded your awstats directory (recommended) you may see a lot of references to icons and images in the awstats directory not found.

Hosts

This is the place you can discover spambots and bandwidth hogs.

What to look for

Hosts has 5 columns: host, pages, hits, bandwidth, and last visit.

If pages and hits are nearly the same or are the same number this is an indication that the site is grabbing one or two pages from your site and using it for hotlinking or a splogger stealing your content via an RSS feed. Typical users request not only the web page but supporting items such as images and css pages.

Check out the IP address to see if it is one you want to ban. We use whois domain tool for IP or domain analysis. The one listed in the example below was from an unknown source in China. Over the course of one day it consumed 10 MB of bandwidth.

We ban any unknown IP or domain which uses a bandwidth of more than 2 MB. Make sure you do not ban your own IP address!

How to Block an IP Address

The example below is how to block IP addresses or domains in an htaccess file (used on Apache servers).

#START Blocking
order allow,deny
deny from 60.12.136.66
deny from .mydomain.com
allow from all
#END Blocking

The host analysis spreadsheet resource is a helpful tool in providing IP analysis.

Search Keyphrases

Contains the keywords and phrases that people typed in to reach your website. This provides valuable insight on what you think are keywords vs. the public.

Robots/Spiders visitors

Shows the top robots and spiders. Make sure the important ones are visiting, such as Googlebot and Yahoo Slurp. If not, they may be blocked or your site is banned.

Connect to site from

Direct Address / Bookmark / Link in Email...

The referring URL is missing from the web page request. This can occur if someone types in the URL in the address bar or sends a link in an email. Since there is no referrer URL you cannot easily find the origin.

Links from an Internet Search Engine

You can tell which search engines are sending the most traffic by the page number. If you don't see the major search engines, for example Yahoo, your site may be banned or have other problems.

Links from an external page

The links displayed are links that someone clicked on or invoked (such as an ad) to your site. There are two columns of numbers: pages and hits. Pages are the number of pages clicked on your site while hits are the number of "hits" on your site. (see Hits vs. Pages below). You can:

  • Check ad campaigns
  • See what other sites are saying about you
  • Keep track of reciprocal links

If you see odd URLs they are probably referer (sic) spam. DO NOT click on them...that is their intention. Also, since some site publish their server logs or the logs are publically (i.e. search engine) accessible, the inbound link to their site gives them slight link juice from your site - at no cost to them and without their permission.

To rid your logs and site of referrer spam you can modify the htaccess file on an Apache server:

 Options +FollowSymlinks
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://(www\.)?spammersite1.com.*$ [OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?spammersite2.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?spammersite3.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?spammersite4.com.*$ [NC]
RewriteRule .* - [F,L]

Change spamsite.com to the bad referrer sites. The bad sites will be given a Forbidden error and not appear in your server logs.

For a detailed explanation, the web page stop referrer spam shows how to block bad referrer URLs. This is for Apache servers only.

In the full view mode, do you have entries with no page values (i.e. only hit values)? If so, then something (usually images) is being served other than the web page itself. One possibility is the site is hot-linking (stealing) your images. In other words, they display your image on their site. It can also be a search engine indexing your images.

Links from an internal page

Shows the clicks on internal links on your website.

Unknown origin

Pages - URL

The top 10 web pages, ordered by viewed, are shown. Four columns exist: viewed, entry, exit, average size.

Viewed

The number of times the page was viewed. The most popular are at the top.

Entry

Clicking on the entry link will order the pages by entry; that is the first page that someone clicked to. This is very valuable information since it shows what people see when they first visit your site. They get there through search engine clicks, email link clicks, social bookmarks and others.

Exit

This shows the *last* page someone viewed before leaving your site.

HTTP Status Codes

Other codes that are bad should be tracked down, too.

401 Unauthorized

Someone trying to access a page or graphic in an area that requires login.

404 Document not Found

Definitely check out the 404 (document not found) codes. They are an indication of broken links, missing pages, missing graphics, or an attack attempt.

Miscellaneous

Add to favorites

This is an estimated value of how many times someone has added one of your website pages to their bookmarks. AWStats looks at the number of requests for the favicon.ico file so this indicator is nebulous.

Hits vs. Pages

A web site's home page is actually a group of files - for example: one text file (index.html), one style sheet to indicate formatting (CSS), six image files (GIF, ICO, and PNG), and some dynamic client-side logic (JavaScript) stored in two separate files on the web server. Calling up the home page will result in ten file requests to the web server, and thus ten hits:

Hit
A hit is a successful request for an object from a web server. Success usually merits a status code of 200 or, for objects that are identical to those already in a user's cache, 304.

Along with bandwidth consumption, hits can be useful as an input for server sizing and capacity planning. While people make much of hits to tout the success of a site, hits have no intrinsic business value. Representations to the contrary probably indicate a lack of understanding of how futile hits are as a useful business measure.

As the internet has matured, more sophisticated attention turned from hits to pages. Unfortunately, there is no standard definition of a page. A web server log file simply contains information on objects requested from the web server. It is up to the web server log file analysis software to give semantic meaning to those objects.

Page
Generally a page is a content object that a user viewed, such as an HTML file, a word processing document, or an Adobe Acrobat PDF file.

AWStats works by exclusion in defining a page. By default, any object accessed by a user on your web server is a page unless it is listed in the NotPageList parameter. You must explicitly add any other objects you do not want to count as pages in AWStats reports. For example, add ZIP archives and Flash animation files to this list by adding their suffixes to the NotPageList directive in the AWStats configuration file:

NotPageList="css js class gif jpg jpeg png bmp ico swf zip tgz gz tar rss xml rdf"

Then AWStats will count everything but the following as pages:

 Files not counted as pages
Suffix Description
css Cascading Style Sheet formatting instruction files
js JavaScript files
class Java program files
gif, jpg, jpeg, png, and bmp Various image/photo formats
ico An image icon file; many sites have a company logo saved as favicon.ico; many browsers use this in bookmarks (favorites) and tabs
swf Shockwave Flash animation
zip, tgz, gz, and tar Archive formats created by PKZip, WinZip, tar, gzip, and others
rss rss news feed
xml xml file
rdf resource description framework

One advantage to this approach is that if you are using a CGI to generate dynamic pages, you do not have to worry about each CGI query counting as a page--this will be automatic.

Bandwidth consumption

Bandwidth consumption is of interest to technical staff, as there is usually an economic cost associated with its use. On a more granular level, large individual file sizes will indicate performance issues, especially for dial-up users.

Bandwidth
The total file size sent from the web server to the end user. This does not include HTTP headers in served objects, HTTP request headers from users, nor bytes needed by the underlying network protocols

Next: Tasks