Normally, the Google Analytics Tracking Code is a JavaScript-based solution that relies on browser rendering and cookie storage. These dependencies are why your Google Analytics reports will NOT show data from most bots hitting your site.

But what if you actually want to track search bot activity, perhaps for technical SEO analysis? We’ve put together a PHP code library called “GA for Search Bots” that uses server-side processing to capture pageviews from a wide range of bots.

Although there is already a Google Analytics server-side script for mobile tracking, it’s designed to still only track activity from browser-based user agents.  I’ve deconstructed the script and modified it so that all construction and sending of the __utm.gif request is handled completely server-side, allowing any user-agent to be tracked.

Of course, we only want to track bots (and not humans) with this script so this code library also contains a configuration file with a ‘whitelist’ of user agents identified as bots. This will ensure that you have a ‘bot only’ Google Analytics profile.

How to setup ‘GA for Search Bots’

  1. Create a new ‘bots only’ profile in your Google Analytics account and GA Web Property ID (i.e. UA-XXXXXX-YY)
  2. Download the ‘GA for Search Bots’ code library
  3. Unzip and place the ‘/gaforsearchbots/’ folder on your website (example: www.domain.com/gaforsearchbots/)
  4. Copy the GA for Search Bots Tracking Code found in sample.php and place it in your PHP source code (example: in your common ‘header’ include file)
  5. Edit the GA for Search Bots Tracking code for the following:
    1. Set the $GA_SB_ACCOUNT variable to the GA Web Property ID.  Swap out ‘UA’ with ‘MO’.
    2. Set the $GA_SB_PATH to the location of the ‘/gaforsearchbots/’ folder.

One thing to point out in this custom code library is that ‘source’ is set as the user agent, not the traditional campaign source. I found it easier to drill down to the different bots with this method. I would also pay a little more attention to Pageviews rather than Visits to better analyze how the bots crawl your site.

Another important thing to understand is that this script only runs when the bot crawls a URL that will actually execute a PHP script with this code.  If a bot crawls a URL that doesn’t return a rendered page (for example, a 500 error), the GA for Search Bots code will not fire.  Because of that, this will not capture 100% of bot activity, especially for many error pages.  Understanding crawler activity against error URLs is extremely important and we’re working on a version 2 of this code to address that.

What can I see in my reports?

Although you can explore all of the standard reports in Google Analytics, I recommend the following Custom Report which is setup for the dimension drilldown of Source->Page.  You are free to modify the report to fit your specific needs.

http://goo.gl/7U8Ul

One of the first things I see in the report is a list of different bots crawling CardinalPath.com

google analytics for search bots custom report

You can see a some of the different search engines crawling the site.  The top entry of ‘Unknown-Robot’ is a catch-all entry for undefined bots.  I believe I know of a few specific bots that are creating all of these hits so what I should do is identify that user agent of these bots and enter them into the botconfig.php file.

Let’s take a look Googlebot crawl activity over a period of 6 weeks:

googlebot crawl in google analytics

There are a few things we can see in this report.  First, we can see the daily crawl activity which seems to have an upward trend during this short period of time.  We also see the top pages that Googlebot is crawling frequently, which might give some insight as to what pages it thinks are important.

The code library is also configured to send a page-level custom variable with a date time stamp in case you want to see pages crawled, sorted by time.  Just choose a secondary dimension of ‘Custom Variable (Value 01)’ against the Pages primary dimension. Here is a breakdown of what time pages were crawled yesterday:

google analytics for search bots pages sorted by time

The data I’ve seen has been pretty cool, but I’m sure there are more interesting things to be found. Please download the files and try it out for yourself.  Share other insightful reports that you discover, or let me know of any improvements that can be made to the code.

Message Sent

Thank you for registering.

Cardinal Path hosted a live session to connect with you and answer all your questions on Google Analytics.
Get all the expertise and none of the consultancy fees in this not-to-be-missed, rapid-fire virtual event.

Thank you for submitting the form.

Thank you for submitting the form.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you.

Click here to download access the tool.

Message Sent

Thank you for registering.

Message Sent

Thank you.

Message Sent

Thank you.

Message Sent

Thank you

Message Sent

Thank you

Message Sent

Thank you.

Message Sent

Thank you

Message Sent

Thank you.

Message Sent

Success!
Your message was received.

Thank you.

Thank you for registering.

Cardinal Path is continuing with its series of free training. Next we are conducting training on Google Data Studio. Check it out here.

Message Sent

Thank you for registering.

Thank you for your submission.

Your request has been submitted and a rep will reach out to you shortly.

Message Sent

Thank you for your interest.

Thank you for registering.

You should receive a confirmation email from GoToWebinar with your unique webinar login information. If you do not receive this email or have trouble logging in to the event, please email asmaa.mourad@cardinalpath.com.

Thank you for subscribing!

You're now looped into the world's largest GMP resource hub!

Thank you for your submission.

Thank you for your submission.

Thank you for your submission.

Thank you for your submission.

Message Sent

Thank you for registering.

Message Sent

Thank you for your submission.

Thank you for your submission.

Message Sent

Thank you for registering.

Thank you for registering.​

Paid media spend by Government websites increased a whopping 139% YoY in 2020.

2020 Online Behavior Live Dashboard

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

2020 Online Behavior Live Dashboard

Thank you for your submission.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Thank you for registering.

Message Sent

Success! Thank you
for reaching out.