Categories: Google Analytics

How to Keep Referral Spam and Bots out of Your Analytics Data

There are both good and bad people in the world of the internet… And, like always, some of the good guys spend a lot of their time trying to figure out how to stop the bad guys from causing trouble. Don’t worry, I’m one of the good guys.

As Google Analytics grows more and more popular for web analytics and online behavior tracking, we see an increasing amount of bots and referral spam hits in GA accounts. Sometimes, this can even amount to 35-40% of total traffic being reported, thereby skewing all metrics ranging from session count and average session duration, to bounce rates and conversion rates.

What is referral spam?
The first step in solving any problem is recognizing that one exists.

One of the biggest misconceptions around referral spam is that it is recorded by the website sending information to the GA account, which gives rise to the notion that the attackers will always target websites of established companies and popular domains. However, referral spam has nothing to do with the website. In fact, in most cases, the attackers do not even (need to) know the website/domain name before they target it.

All they need is the GA property ID, which they generate in huge volumes as random numbers to match the GA pattern of UA-99999999-99 and use custom scripts from their servers to send hits to all these GA properties.

The primary motive of the attackers is to lure you back to their site to sell something. If you look at your GA acquisition reports and find that 20% of your traffic is being referred to your site by free-video-tool.com, under normal circumstances, you’d want to find out who they are, and thank them. But when you do go to visit their site, you realize they’re a very fishy looking ecommerce site that have nothing to do with your site or business…

How to identify referral spam in Google Analytics?
Acquisition and Behavior Metrics
There are 3 criteria collectively used to identify referral spam sources:

  • Bounce Rate = 100%
  • % New Sessions = 100%
  • Average Session Duration = 00:00:00

The Referrals report is a good place to identify it. Sort your report in decreasing order of Bounce Rate and if you have spam referrals, you should see them. They look something like this:

It’s important to look for all 3 criteria together, since, it’s perfectly normal to have a source with a few sessions and 100% bounce rate or 00:00:00 session duration, or even a brand new source with 100% new sessions.

The last, but not least, most important factor is the name of the source itself. In most cases, the name is a clear give-away. Looking at the image above free-video-tool.com, keywords-monitoring-your-success.com and magicdiet.gq are some obvious candidates, but not google.com. You may see a few domains that you recognize on this list that meet all 3 criteria. This means that real traffic from that source really did bounce off your site as soon as they landed.

If you’re unable to identify a source as spam even by looking at it, your last resort would be to visit the domain in question to find out. Please ensure that you have appropriate antivirus and anti-malware software setup on your system before you do so since some of them can harm your system the moment you enter the site.

Technology Metrics

This is a more technical approach to identifying metrics, but can remove large chunks of spam if done correctly. Looking at the Technology reports in GA, you can identify some browser versions which skew your data.

Here’s a good example of this:

The image shows Browser Version 46.0.2490.80, which is an October 2015 version of Google Chrome. As of July 2016, Chrome is running 51.0.2704.106 as the latest stable version on all devices, and ideally, nobody should be using such an old version. The spam qualifying metrics are also extremely close to the mark, thereby skewing 12.83% of the reported data! In this case, it’s safe to conclude that the spam is coming from Browser Version 46.0.2490.80.

You can check the latest versions of Google Chrome here.

The next image shows Flash Version 11.5 also skewing 21.39% of the analytics data with metrics alarmingly close to the spam matching criteria:

The latest versions of Flash Player are 18.0 or 22.0 for most browsers/platforms. You can check them out here!

Bad Hostnames

Although less frequent and lower in volume, this form of data corruption is neither spam nor a referral. In the All Pages report, switch to Hostname as your primary dimension. Sometimes you may see hostnames that are not a part of your business. They may just mean that other websites are inadvertently using your GA Property ID.

In the image below, we see a few hostnames: cmobilescreen.com, web.archive.org, etc. that are not part of the Cardinal Path brand of websites. The numbers show that they constitute a small fraction of the data, but it’s always good to identify any element that corrupts the cleanliness and integrity of your reports and weed it out.

How does referral spam affect my reports?

Let’s look at the data in the first image in this document and assume that 20% of all sessions qualify as spam/bot referrals and bad hostnames. Let’s analyze the impact of excluding these sessions on the behavior metrics:

It’s clear from the analysis that all metrics are not only more accurate, but also significantly better when spam sources are excluded.

How do I keep referral spam out of my reports?

Now that we’ve seen how to identify spammers on your GA account and how it affects your reporting and metrics, the next step is to understand how to stop it from corrupting your data. The solution to cleaning your reports up is multi-fold:

Google Analytics Bot Filter

GA provides a built-in bot filter at View level. Go to the View Settings and check the box for Bot Filtering.

Manual Exclude Filter

While GA’s bot filter is fairly effective, it may not cover all spam source referrals. Any additional spam sources identified in the reports can be debarred from reports using manual filters.

Create an Exclude filter for all Campaign Source, Browser Version or Flash Version values that have been identified as spam to be excluded from reports. The screenshots below show sample filters created for Campaign Source on free-video-tool.com and magicdiet.gq, and Flash Version 11.5.

Manual Include Filter

Similar to the exclude filter, it’s good to have an include filter to keep out any traffic that is not in your “white-listed” set of domains. For instance, the filter to keep out all traffic without cardinalpath.com in the hostname looks like this:

 

In accordance with GA filter behavior, neither of these methods is retroactive. This means that spam/bot data that already exists in the reports cannot be removed. The filters will simply block spam referrals and other data corruption going forward.

You can learn more about RegEx here or at the link provided on the filter configuration page.

What else do I need to know?

Regular Checks

It is also helpful to regularly update these filters with new spam sources. The cadence for this will depend upon many factors within the organization and we recommend anywhere between a biweekly to monthly check-in for the same.

Referral Exclusion List

Do NOT use the Referral Exclusion list in Property > Tracking Info to block the spam domain. This list is used to preserve sessions from internal cross-domains. Adding spam source domains to this list will only hide the bot traffic from reporting, but won’t actually remove it.

Annotations

Annotations are a handy GA tool to keep track of significant activity on your GA account / website that may affect traffic trends. Implementing bot and manual filters will result in a noticeable drop in traffic volume being reported and it’s good to keep a note of when the filters were applied.

Conclusion

These are some of the most common sources of spam and data corruption in GA reports. Careful identification and filtering of these elements increases the cleanliness, integrity and reliability of your reports and ultimately, gives you more confidence in your analysis and insights helping you make better decisions for your business.

Whether you’re looking for a vendor-agnostic expert to help you with your analytics implementation, or if you are looking for an analytics audit, we can help. Contact us to speak with an expert about your needs.

Pratik Gupta

Pratik is an IT-Business enthusiast, with several years of IT experience. He began his career with Accenture's technology delivery vertical as a Software Engineer. Pratik has also worked as an IT Systems Administrator at UIC - School of Continuing Studies where he performed a variety of tasks like IT Helpdesk support, web and database development and maintenance, as well as end-to-end SDLC execution of small-scale IT projects. In addition to this, he helped UIC kick-start their Digital Marketing efforts by learning and implementing Google Analytics. He also has an interest and basic understanding of Google AdWords and Tableau, as well as being Google Analytics Premium Certified. Pratik holds a Bachelor’s degree in Computer Engineering from the University of Mumbai and a Masters in Management Information Systems from University of Illinois - Chicago (UIC).

Share
Published by
Pratik Gupta

Recent Posts

Google Delays Third-Party Cookie Deprecation to 2025

Google announced on April 23 that it will again delay third-party cookie deprecation (3PCD) in…

4 days ago

Understanding Funnel Reports in GA4

Funnel reports have long been one of the most actionable reports in a marketing analyst’s…

6 days ago

GA4 Monetization Reports: An Overview

GA4’s Monetization reports provide organizations with simple but actionable views into the revenue-generating aspects of…

2 weeks ago

This website uses cookies.