Categories: Technology ServicesWeb Analytics

Analytics Fight Night Round 2: ATOP and ATOS are flawed samples

This post is a follow up to Brian Katz Google Analytics Fight Night: No Metric is useless, not even Average Time on Site/Avg. Time on Page, and my own Average-Time-On-Site and Average-Time-On-Page are Useless Blog Metrics.

Brian, I think you misinterpret my point, or else I misinterpret your interpretation of my point. I’m not saying that Google uses a measurement that can’t accurately report, rather that its sampling methodology makes its data useless in specific situations, such as blogs.

Specifically, it seems to me that ATOP/ATOS uses an improperly selected sample of your data. It only looks at non-exit visitors, which might have some use (say if you want to find something out about non-exit users, though I cant imagine what), but is insufficient for making any conclusions about content or overall users.

No, ATOP is not flawed because it is correctly calculated (compare the 2 formulae above) and because it’s an average. If it were a measure of Total Time on Page, then it would be wholly incomplete and seriously flawed. But as an average, it is valid for comparisons and trends.

The flaw I speak of is more “what we’re measuring” rather than “how its measured”, and whether that is a meaningful. ATOP is an average of TOP for non-exit pages, but it is not an average of all users.

When looking at a blog post (which, again, are often exit pages) GA is only counting a small percentage of the visitors to this page under ATOP. While this is more than enough for an accurate random sample, it’s… well… it’s not random. It’s only counting users who don’t exit, and discounting those who do. This means that certain users get more heavily counted than others. For instance the following may get more heavily weighted:

JavaScript Enabled Bots: Bots are going to be browsing multiple pages at a time.
First Time Visitors: FTV’s are more likely to visit multiple pages, since there is more content that they haven’t previously seen.
People who leave comments : leaving a comment makes for a secondary page call, meaning the time they spent on page gets counted.
etc.

Let’s take a purely hypothetical example:

The blog post Who is looking at your Facebook? 3 Privacy Options you NEED to set is very text heavy. Let’s say I decided to spruce it up a bit, add more images, add more headlines, bullet points. Generally I improve the entire user experience. When I check, the ATOP improved, but not nearly enough to justify the time. Sadly I slink away.

Meanwhile, our pages get browsed by bots a lot. And I DO mean a lot. These continue browsing away at their normal rate.

In all actuality, however, I’ve increased the amount of people who are fully reading the page – increasing the ATOP for 90% of users. However, because “real” users (actual people at their keyboards) are leaving on that page, I am not seeing as many of their numbers included in this average. So the change is there and noticeable, but the percentage of the change – the meaningfulness of the change – all of this is up in the air.

Of course these are very contrived situations – it’s not like actual bots are all sitting there for a page load, doing an instant spider, then zipping away. Rather, what I am trying to illustrate is that ATOP is, by its nature, noisy data. There’s just too much potential for error in any measurement that selectively ignores a majority of users.

The worst that one can say of ATOP is that the sub-set of a post’s views included in the calculation is not representative of all the post’s views. However, it should be used as a metric to measure the engagement of visitors who found the post relevant. As such, if post A is more engaging than post B, would that not apply both to visitors who came to read it and left as well as those who came to read it and viewed another?

If I understand what you’re saying, you presume that people who exited on that page didn’t find it relevant, when it’s very possible that they did. In fact, a reader could have spent 10 minutes analyzing every part of that post, then left to mull it over, returned 40 minutes later (now they’re a new visit), and just gone straight to comment. Now the TOP for their initial (long) visit isn’t counted, and the TOP for their secondary (short) visit is.

Remember: it’s not just bounces that don’t get counted, but exits.

ATOS is the same, though perhaps not AS bad as I thought (provided we use custom segments). But it’s still using a biased sample, which means that we have to be incredibly wary of insights we garner from it.

All in all these measurements seem troublesome in almost any setting, but they are especially troublesome when on pages with high exit rates, such as blog posts.

On a side note, your last Analytics Fight Night post had an exit rate of 64%! Quite low compared to some of our others.