Return CGI Library
This page describes the information available from and the use of the Refer Log Analysis Program written by BareMetal. It also
describes what the logs are, and what is in them.
The analysis program has three main functions:
- It will tell you how many references came from which host.
- It will tell you how many references came from which URL.
- It will analyze the search strings that it can find (and recognize) in the referer log and produce a sorted "hit list" of the words
searched on.
Additionally, the program has some very flexible search capabilities (restricting
output to links coming FROM or NOT FROM certain hosts/URLS, and going TO or NOT TO
certain URLS), and some output control features (show raw log entry, or complete
search string).
Now that we've covered the executive summary, let's move on to the introduction:
What is a Referer Log?
When a modern web browser follows a link from one page to another (or even to a graphic), it tells the web server where it found the
link. This is called the Refering page. A referer log is simply a log file containing all these referering pages, and the URL they led
to. Here's a very short example:
http://groucho.gcal.ac.uk/SupportStuff/mac-specific.html#macsupport ->
/ISO/ISOmain.html
http://web.mit.edu/mugs/www/fmug.htm -> /ISO/ISOmain.html
http://www.mindspring.com/~fmpro/reference.html -> /ISO/ISOmain.html
[Actually, it's not all the references. We try to ignore the references within a site because we're most interested in the links that
brought a browser into your site from somewhere outside of it.]
At BareMetal we've modified the above format to include the access time, so that you can link the referer.log entry into the access
logs and track a particular browser through the site from their first contact...
What good is it?
How usefull the referer log is depends on what your site is used for.
If your site is an online brochure that you personally refer people to, then it may not be very usefull to you.
If your site is used to bring in prospective clients, then you are probably interested in finding out how people got to your site, so
that you can try to bring in more people.
What will it tell me?
There referer log can be used to find out what hosts, URLs, search engines, and what search text (in some cases) is bringing clients
into your site.
Knowing where your visitors are coming from can help you tailor your site to match the visitor.
Knowing what links they are following in can tell you if an exchanged link, or a purchased link is working.
Knowing what queries people are entering into search engines can help you write your pages to fit those queries so that you can rank
higher in the search results.
OK, I'm sold. How do I use it?
(I knew you'd come around :-)
Everyone:
The referer log capability is an optional part of the Apache and NCSA servers. You might have to recompile your webserver before you
can configure the referer log. At that point you will have the raw information from which to get the information referered to above.
BareMetal clients:
We have (of course) setup the referer log capabilites for all of our clients. The location of the raw datafile is described in the
original welcoming document you got, and the welcoming checklist.
The above steps will lead you to the RAW log files.... These tend to be very large and hard to read.
The BareMetal Referer Log Analysis Program:
Simplicity :-). This program has three main functions:
- It will tell you how many references came from which host.
- It will tell you how many references came from which URL.
- It will analyze the search strings that it can find (and recognize) in the referer log and produce a sorted "hit list" of the words
searched on.
Additionally, the program has some very flexible search capabilities (restricting output to links coming FROM or NOT FROM certain
hosts/URLS, and going TO or NOT TO certain URLS), and some output control features (show raw log entry, or complete search string).
How to start it!
The important part! :-) This is easy. Put in the name of your server in the following:
http://your.server/sec-bin/referer.pl
Don't forget that it's password protected with your ftp userid and password (sorry visitors, examples are coming).
How to control it
I think the easiest way to describe using the program is with some examples. First, the start up screen:
Use lower case only:
Please note the "/www" under Target URL. (I've restricted the analysis to hits that brought browsers into the /www subdirectory.)
Now a little sample output:
Referer stats for server home.baremetal.com:
Host Counts:
28 webcrawler.com
19 guide-p.infoseek.com
15 lycos.com
13 altavista.digital.com
12 excite.com
4 kudosnet.com
<SNIP>
URL Counts:
19 http://guide-p.infoseek.com/Titles
15 http://www.webcrawler.com/cgi-bin/WebQuery
15 http://www.lycos.com/cgi-bin/pursuit
13 http://webcrawler.com/cgi-bin/WebQuery
12 http://www.excite.com/search.gw
7 http://www.altavista.digital.com/cgi-bin/query
6 http://altavista.digital.com/cgi-bin/query
3 http://www.kudosnet.com/portfolio/
<SNIP>
TEXT search WORD Counts:
27 web
25 hosting
8 Web
7 Hosting
5 resell
5 host
4 service
4 baremetal
3 server
2 virtual
2 Service
<SNIP>
Sigh, I hate it when two day old pages are out of date :-). The URL listing above is now listed as active <a href=...> links,
and if you turn on the raw log entries, each URL is shown completely... and can be followed backwards into the search page or remote
page with a link that the client followed.
Lets look at the output first. Almost all of our hits are coming in from search engines. This makes a lot of sense when you consider
that most people linking to the site link to the top page, and we've restricted our analysis to links coming into the /www directory.
You say there's a descrepency between the host report and the URL report! (Good eyes :-) Right and wrong. The host report is a summary,
and we dropped any www. prefixes from host names, so www.webcrawler.com and webcrawler.com got combined for the host counts, but are
reported separately for the URL listing.
The "TEXT search WORD Counts" indicate that the searches leading folks into this area are very heavily weighted towards "web hosting"
which makes sense. It's a good short summary of what people are looking for.
Now, can you see the correspondence between the three check boxes on the form, and the three output areas (host count, URL count, word
count)?
Some comments about the require and exclude fields. These are actually space separated lists, and the match is simply a substring
match.
Usually you would list a host name (or several) that you either wanted to analyze, or exclude from the analysis, in the "Refering
URL" column. But this is more flexible.... You could enter a require value of "hosting" or "cgi-bin" (both part of the URLs of
referer entries from the search engines) to get some of the search engine entries.
The Target URL column behaves the same way, but controls the destination of the reference ... which is obviously on your server,
so the host name isn't even part of the log entry... just the directory/file name component.
The final two check boxes are for detail work.
The "Display Query Terms" check box will display each actual search string as it is encountered.
The "Display Raw Log Entry" will display the complete line from the referer line. This is probably best used with restrictions on the
refering or target URLs.... if you wanted the complete log file it would be faster to just download it :-).
I think that's it!
Bug reports, questions, purchase requests :-)
Send it all to support@baremetal.com.
|