When working on complex websites, we often end up with tons of URLs that are, in reality, the same page with different query-string arguments.

Why is this important?

  1. Filter out arguments that are creating unnecessary duplicates
  2. Reduce the number of distinct entries in reports and the risks of breaking the number of uniques imposed by most tools
  3. Increase the accuracy of Page Views
  4. Identify rogue query string parameters and wrong arguments (as well as those with mixed upper/lowercase which are likely also considered as distinct ones)

The solution

I love Analytics Canvas, so I thought – why not try to do a Query String analysis using it! The original idea comes from Peter O’Neill, who offers an Excel spreadsheet with macros to do this.

The algorithm looks like this:

  • Fetch data from GA: only paths with URLs, sorted by descending unique page views
  • Split the URL to get only the Query String part
  • Further split the query string to extract value-pairs (up to 5 arguments in this case) and put them in a new column named “arg”
  • Combine everything back but keep only the “arg” column
  • Filter out rows where “arg” is empty (i.e. there were less than 5 arguments)
  • Split out the “arg” column into “var=value” pairs and keep only the “var” column
  • Combine duplicate “var” and create a new column with a total of occurences of each one
  • Sort in alphabetical order
  • Store into Excel
  • Voilà!

 

It worked on a pretty complex website which had nearly 30k URLs with query string parameters and ended up with 195 distinct arguments!

Here’s how the results looks like:

Parameter Count
transit 9960
page 1463
sLang 1454

Once equipped with this information, we can adequately configure Google Analytics by filtering out unnecessary parameters and improve the overall quality of our analytics.