But what if you actually want to track search bot activity, perhaps for technical SEO analysis? We’ve put together a PHP code library called “GA for Search Bots” that uses server-side processing to capture pageviews from a wide range of bots.
Although there is already a Google Analytics server-side script for mobile tracking, it’s designed to still only track activity from browser-based user agents. I’ve deconstructed the script and modified it so that all construction and sending of the __utm.gif request is handled completely server-side, allowing any user-agent to be tracked.
Of course, we only want to track bots (and not humans) with this script so this code library also contains a configuration file with a ‘whitelist’ of user agents identified as bots. This will ensure that you have a ‘bot only’ Google Analytics profile.
How to setup ‘GA for Search Bots’
- Create a new ‘bots only’ profile in your Google Analytics account and GA Web Property ID (i.e. UA-XXXXXX-YY)
- Download the ‘GA for Search Bots’ code library
- Unzip and place the ‘/gaforsearchbots/’ folder on your website (example: www.domain.com/gaforsearchbots/)
- Copy the GA for Search Bots Tracking Code found in sample.php and place it in your PHP source code (example: in your common ‘header’ include file)
- Edit the GA for Search Bots Tracking code for the following:
- Set the $GA_SB_ACCOUNT variable to the GA Web Property ID. Swap out ‘UA’ with ‘MO’.
- Set the $GA_SB_PATH to the location of the ‘/gaforsearchbots/’ folder.
One thing to point out in this custom code library is that ‘source’ is set as the user agent, not the traditional campaign source. I found it easier to drill down to the different bots with this method. I would also pay a little more attention to Pageviews rather than Visits to better analyze how the bots crawl your site.
Another important thing to understand is that this script only runs when the bot crawls a URL that will actually execute a PHP script with this code. If a bot crawls a URL that doesn’t return a rendered page (for example, a 500 error), the GA for Search Bots code will not fire. Because of that, this will not capture 100% of bot activity, especially for many error pages. Understanding crawler activity against error URLs is extremely important and we’re working on a version 2 of this code to address that.
What can I see in my reports?
Although you can explore all of the standard reports in Google Analytics, I recommend the following Custom Report which is setup for the dimension drilldown of Source->Page. You are free to modify the report to fit your specific needs.
One of the first things I see in the report is a list of different bots crawling CardinalPath.com
You can see a some of the different search engines crawling the site. The top entry of ‘Unknown-Robot’ is a catch-all entry for undefined bots. I believe I know of a few specific bots that are creating all of these hits so what I should do is identify that user agent of these bots and enter them into the botconfig.php file.
Let’s take a look Googlebot crawl activity over a period of 6 weeks:
There are a few things we can see in this report. First, we can see the daily crawl activity which seems to have an upward trend during this short period of time. We also see the top pages that Googlebot is crawling frequently, which might give some insight as to what pages it thinks are important.
The code library is also configured to send a page-level custom variable with a date time stamp in case you want to see pages crawled, sorted by time. Just choose a secondary dimension of ‘Custom Variable (Value 01)’ against the Pages primary dimension. Here is a breakdown of what time pages were crawled yesterday:
The data I’ve seen has been pretty cool, but I’m sure there are more interesting things to be found. Please download the files and try it out for yourself. Share other insightful reports that you discover, or let me know of any improvements that can be made to the code.