A recent blog post of Joel Oleson's got me thinking about who visits my blog. I run my blog on my own server and I have access to all of the logs. SharePoint sits on top of IIS and I can get all the information I need from the IIS logs. In this blog post I'll cover how I tackled going through my IIS logs and getting some useful information out of them, all with free products.
Where to begin?
I knew I wanted to get some information about the visitors to my blog, but I needed to decide exactly what information I wanted. Initially I was just curious about the number of people that were subscribed to my blog, and the number of unique visitors I get per month (or year, or whatever). I knew that information was in my IIS logs, I just needed a way to get it out. I decided to go with Log Parser. It's a free tool from Microsoft that provides a way to get information from a variety of log types in Windows, IIS logs included. It is just as flexible about output file types as it is input types. It's a command line tool, so it lends itself well scripting. That's great if you want to run these reports daily or monthly. Because of its flexibility you'll be able to use it for other monitoring in your environment once you get used to how it works.
The downside of Log Parser is that it can be tough to get up to speed on. The learning curve can be a little steep, depending on your background. Fortunately, there is a pretty good free GUI front end for Log Parser called Log Parser Lizard (LPL). I used that to get started with my Log Parser queries. LPL comes with many queries built in, including IIS queries and even ones that do what I wanted to do. The best part is that any queries you generate in LPL you can copy right out into the command line LogParser.exe. This is a great way to get started.
What information do I want to get?
Like I mentioned before, I was mainly interested in the stats mentioned in Joel's blog post. Once I started playing with LPL though, I saw all kind of other great information I could get out of my logs. My eyes started getting big, so I added some more stats to the list I wanted to get. Here's the list I came up with:
- Number of unique users
- Number of subscribers
- Top IP hits
- What referrers were sending people to my site
- Which pages were most popular
- How much data out
- Which HTTP status codes are being generated
To make things easy for this blog post I'll be getting all of these states for a one month period. I'll be getting the number of unique users per month, number of subscribers per month, etc. To change the interval you'd just change which log files you're interrogating. We'll cover that later.
How do I run these queries?
First I installed Log Parser and LPL on my SharePoint server. I could have installed it on a client and just copied the log files over. Either way works fine. To start I fired up LPL, here's what it looks like:
You can see there are a lot of log types that Log Parser can log and LPL provides samples for each. To keep my queries separate I created a new group for my SharePoint queries. I did this by clicking Queries in the toolbar and New Group from the popup.
Then I created a new query and called it Unique Users.
Then I saved the query. You'll notice there isn't any actual query yet. Since we are just doing IIS queries I looked at the examples in the IIS Logs group. Fortunately there is an example called Distinct Users that is almost exactly what I was looking for. It will just need a few tweaks. I brought the Distinct Users query up, copied it and pasted it into my new query.
Those of you that have done any SQL queries will recognize the format immediately. They are standard TSQL queries. This query works out of the box, but it needs some tweaking for my situation. First I needed to change what was being counted. The example looks for the number of distinct values in cs-username. Since this is a blog that is read anonymously, that column will always be empty. Instead, I'm using the c-ip column, which is the client IP address. If you're running these queries in a collaboration environment, using cs-username might make more sense.
Next, I need to alter the FROM portion. The phrase #IISW3C# is a constant that defines which web logs to query. Since I'm restricting my query to a single month of logs, I need to change my query to reflect that. I changed the FROM line to this:
FROM C:\WINDOWS\system32\LogFiles\W3SVC579986181\ex0809*.log
This tells Log Parser to parse all the logs from September of 2008. If your IIS logs are in a different directory you'll have to adjust this line accordingly. I could have also changed the file name to ex08*.log to get all of the logs from this year. Next I needed to add a WHERE command to filter out all the logs from other parts of my web site that are not part of my blog. I don't want to pad my stats, after all. I added the following line to the bottom:
WHERE cs-uri-stem like '%/blog/%'
Cs-uri-stem is the column of the IIS log that contains the URL requested. TSQL queries use % as a wildcard variable. Now my whole query looks like this:
SELECT COUNT(DISTINCT c-ip) AS Users
FROM C:\WINDOWS\system32\LogFiles\W3SVC579986181\ex0809*.log
WHERE cs-uri-stem like '%/blog/%'
To run it, I clicked Generate (you can also hit F5) in the toolbar. Here's what the results looked like:
The results show that in the month of September 11,264 distinct IP addresses accessed my blog. To expand the months included, alter the FROM command. If I wanted a list of IP address instead of the count, I could remove the COUNT() part of the query. I'll also through a command in to sort the IP addresses. The new query would look like this:
SELECT DISTINCT c-ip AS Users
FROM C:\WINDOWS\system32\LogFiles\W3SVC579986181\ex0809*.log
WHERE cs-uri-stem like '%/blog/%'
ORDER by Users
Like I mentioned before, in a collaboration environment usernames are probably more important than IP addresses. To get a sorted list of users that have hit your site, replace c-ip with cs-username. This could be very useful to determine which users are hitting a site or even a specific document, like an HR document.
Let's look at some of the other queries I ran. The next was how many people are subscribed to my blog. This is just a modified version of my previous query. I simply added some WHERE clauses to pare down the results that are captured. To find the URL to filter on I clicked the RSS link on the front page of my blog. You'll see the URL points to /_layouts/listfeed.aspx followed by the list to watch. I simply altered my WHERE clause to include the listfeed.aspx page and added a filter for the query string. The query looks like this:
SELECT count(distinct c-ip) as Subscribers
FROM C:\WINDOWS\system32\LogFiles\W3SVC579986181\ex0809*.log
WHERE cs-uri-stem like '/blog/_layouts/listfeed.aspx' and cs-uri-query like '%56f96349%'
When run it looks the results look like the following:
Next was which IPs were hitting my site the most. It's just a further extension of the stuff I had already done. I used the following query:
SELECT distinct c-ip AS IP, COUNT(*) AS [Total Hits]
FROM C:\WINDOWS\system32\LogFiles\W3SVC579986181\ex0809*.log
WHERE cs-uri-stem like '/blog/%'
GROUP BY IP
ORDER BY [Total Hits] DESC
I got the following results:
No real surprises, the IP addresses I get the most hits from are search engines and aggregators. As a corollary I wanted to see what the sites that referred others to me were. This query gave me that information:
SELECT distinct cs(referer), COUNT(*) AS [Total Hits]
FROM C:\WINDOWS\system32\LogFiles\W3SVC579986181\ex0809*.log
WHERE cs-uri-stem like '/blog/%'
GROUP BY cs(referer)
ORDER BY [Total Hits] DESC
For my blog the main referrers were other posts in my blog, and a few posts from other people's blogs.
Another related query is which URLs are hit the most on my site. Altering my query a little I came up with this:
SELECT distinct cs-uri-stem as Pages, COUNT(*) AS [Total Hits]
FROM C:\WINDOWS\system32\LogFiles\W3SVC579986181\ex0809*.log
WHERE cs-uri-stem like '/blog/%'
GROUP BY Pages
ORDER BY [Total Hits] DESC
My next question was how much data have I served up? LPL has a canned Total Bytes Sent query, so I just modified it for my needs and I tweaked the output a little. Here's what I used:
SELECT DIV (SUM(sc-bytes) , 1048576) AS TotalMegaBytesSent, Count(*) as TotalHits
FROM C:\WINDOWS\system32\LogFiles\W3SVC579986181\ex0809*.log
WHERE cs-uri-stem like '%/blog/%'
I changed the FROM command and added a WHERE clause. I also adjusted the SELECT statement. The IIS logs saves the sc-bytes data as bytes. That's really not much help to me, so I divided it by 1024 twice to go from bytes to kilobytes to megabytes. The results looked like this:
Looks like I've sent out roughly 5 GB in the month of September.
My final query was HTTP Status Codes. This was mainly just to see if there 404s. Again LPL came to the rescue. It had a query I could already use. I modified it to the following:
SELECT STRCAT(TO_STRING(sc-status), STRCAT('.', TO_STRING(sc-substatus))) AS Status,
COUNT(*) AS Total
FROM C:\WINDOWS\system32\LogFiles\W3SVC579986181\ex0809*.log
WHERE cs-uri-stem like '%/blog/%'
GROUP BY Status
ORDER BY Total DESC
Here were the results.
Not a lot of 404s. That's good sign.
Those were all of the queries I wanted. Now I just need to work on my procedures a little. I saved each of my queries in LPL. The good news is that they work as is with the LogParser.exe. If you want to save the output you will need to specify an output file type. Here is the previous query run with LogParser.exe, specifying XML output to a file out.xml.
C:\Program Files\Log Parser 2.2>LogParser.exe -o:XML "SELECT STRCAT(TO_STRING(sc-status), STRCAT('.', TO_STRING(sc-substatus))) AS Status, COUNT(*) AS Total INTO out.xml FROM C:\WINDOWS\system32\LogFiles\W3SVC579986181\ex0809*.log WHERE cs-uri-stem like '%/blog/%' GROUP BY Status ORDER BY Total DESC"
I've bolded the parts I added for the output. I added –o:XML to tell LogParser.exe to use the XML output and INTO out.xml tells LogParser.exe the filename. The LogParser help file has all of the output types. Using template files (TPL in the help) you can specify unsupported types like HTML.
These examples were catered to an Internet site, but they could be altered to be used on internal sites as well. Like I mentioned before, you could use it to see which authenticated users are hitting which sites.
I hope this has helped. I know how much managers and bosses love pretty charts and graphs.
tk