The Blog

Tracking Search Bots in Google Analytics

Normally, the Google Analytics Tracking Code is a JavaScript-based solution that relies on browser rendering and cookie storage. These dependencies are why your Google Analytics reports will NOT show data from most bots hitting your site.

But what if you actually want to track search bot activity, perhaps for technical SEO analysis? We’ve put together a PHP code library called “GA for Search Bots” that uses server-side processing to capture pageviews from a wide range of bots.

Although there is already a Google Analytics server-side script for mobile tracking, it’s designed to still only track activity from browser-based user agents.  I’ve deconstructed the script and modified it so that all construction and sending of the __utm.gif request is handled completely server-side, allowing any user-agent to be tracked.

Of course, we only want to track bots (and not humans) with this script so this code library also contains a configuration file with a ‘whitelist’ of user agents identified as bots. This will ensure that you have a ‘bot only’ Google Analytics profile.

How to setup ‘GA for Search Bots’

  1. Create a new ‘bots only’ profile in your Google Analytics account and GA Web Property ID (i.e. UA-XXXXXX-YY)
  2. Download the ‘GA for Search Bots’ code library
  3. Unzip and place the ‘/gaforsearchbots/’ folder on your website (example: www.domain.com/gaforsearchbots/)
  4. Copy the GA for Search Bots Tracking Code found in sample.php and place it in your PHP source code (example: in your common ‘header’ include file)
  5. Edit the GA for Search Bots Tracking code for the following:
    1. Set the $GA_SB_ACCOUNT variable to the GA Web Property ID.  Swap out ‘UA’ with ‘MO’.
    2. Set the $GA_SB_PATH to the location of the ‘/gaforsearchbots/’ folder.

One thing to point out in this custom code library is that ‘source’ is set as the user agent, not the traditional campaign source. I found it easier to drill down to the different bots with this method. I would also pay a little more attention to Pageviews rather than Visits to better analyze how the bots crawl your site.

Another important thing to understand is that this script only runs when the bot crawls a URL that will actually execute a PHP script with this code.  If a bot crawls a URL that doesn’t return a rendered page (for example, a 500 error), the GA for Search Bots code will not fire.  Because of that, this will not capture 100% of bot activity, especially for many error pages.  Understanding crawler activity against error URLs is extremely important and we’re working on a version 2 of this code to address that.

What can I see in my reports?

Although you can explore all of the standard reports in Google Analytics, I recommend the following Custom Report which is setup for the dimension drilldown of Source->Page.  You are free to modify the report to fit your specific needs.

http://goo.gl/7U8Ul

One of the first things I see in the report is a list of different bots crawling CardinalPath.com

google analytics for search bots custom report

You can see a some of the different search engines crawling the site.  The top entry of ‘Unknown-Robot’ is a catch-all entry for undefined bots.  I believe I know of a few specific bots that are creating all of these hits so what I should do is identify that user agent of these bots and enter them into the botconfig.php file.

Let’s take a look Googlebot crawl activity over a period of 6 weeks:

googlebot crawl in google analytics

There are a few things we can see in this report.  First, we can see the daily crawl activity which seems to have an upward trend during this short period of time.  We also see the top pages that Googlebot is crawling frequently, which might give some insight as to what pages it thinks are important.

The code library is also configured to send a page-level custom variable with a date time stamp in case you want to see pages crawled, sorted by time.  Just choose a secondary dimension of ‘Custom Variable (Value 01)’ against the Pages primary dimension. Here is a breakdown of what time pages were crawled yesterday:

google analytics for search bots pages sorted by time

The data I’ve seen has been pretty cool, but I’m sure there are more interesting things to be found. Please download the files and try it out for yourself.  Share other insightful reports that you discover, or let me know of any improvements that can be made to the code.

  • http://www.klipsomanie.com/ sac original

    Hello, I’m trying to use your script :
    1) It seems not working with MO-xxxxxx-x but working with UA-xxxxxx-x
    2) I can’t use it with wordpress blog .. do you now why ??

    I added in botconfig.php this line to see googlebot :

    ‘Googl(e|ebot)/([0-9.]{1,10})’ => ‘Google Bot’,

  • B V

    Great article, will be taking a closer look at it.

    Just a quick headsup, when i tried to use the fb like button to share the post i’m unable to do so as the Share this post bar it is in cuts it off and hides the fb box.
    I’m using Chrome browser.

  • Anna

    Hi Adrian,

    thanks so much for sharing your useful tips. I have implemented the code to my blog, and then tried to import the custom report to my GA account, but this last one did not work out.

    When I click on the link you shared, I always am redirected to my GA custom report dashboard, without any new report integrated.

    Also, until now, no trafic is to be seen on the standard reports. Perhaps it is too early (I implemented the code less than 24h ago).

    Does this link + code snippet still work ?

    Thanks for your time.

  • Barbara

    Hello Adrian,
    This is exactly what I’ve been looking for. I am having so difficulties with report. Can you explain what variables or metrics I am looking for?

  • Drew

    Hi, I realize this is from a couple years ago – could this also be made to work with Universal Analytics? Thanks for sharing this, it’s really cool :)


Cardinal Path Training

Copyright © 2014, All Rights Reserved. Privacy and Copyright Policies.