Practical solution: extracting a list of Query Strings from URLs | Cardinal Path Blog
Blog

Practical solution: extracting a list of Query Strings from URLs

When working on complex websites, we often end up with tons of URLs that are, in reality, the same page with different query-string arguments.

Why is this important?

  1. Filter out arguments that are creating unnecessary duplicates
  2. Reduce the number of distinct entries in reports and the risks of breaking the number of uniques imposed by most tools
  3. Increase the accuracy of Page Views
  4. Identify rogue query string parameters and wrong arguments (as well as those with mixed upper/lowercase which are likely also considered as distinct ones)

The solution

I love Analytics Canvas, so I thought – why not try to do a Query String analysis using it! The original idea comes from Peter O’Neill, who offers an Excel spreadsheet with macros to do this.

The algorithm looks like this:

  • Fetch data from GA: only paths with URLs, sorted by descending unique page views
  • Split the URL to get only the Query String part
  • Further split the query string to extract value-pairs (up to 5 arguments in this case) and put them in a new column named “arg”
  • Combine everything back but keep only the “arg” column
  • Filter out rows where “arg” is empty (i.e. there were less than 5 arguments)
  • Split out the “arg” column into “var=value” pairs and keep only the “var” column
  • Combine duplicate “var” and create a new column with a total of occurences of each one
  • Sort in alphabetical order
  • Store into Excel
  • Voilà!

 

It worked on a pretty complex website which had nearly 30k URLs with query string parameters and ended up with 195 distinct arguments!

Here’s how the results looks like:

ParameterCount
transit9960
page1463
sLang1454

Once equipped with this information, we can adequately configure Google Analytics by filtering out unnecessary parameters and improve the overall quality of our analytics.

This entry was posted in Technology, Web Analytics and tagged . Bookmark the permalink.

Copyright © 2016, All Rights Reserved. Privacy and Copyright Policies.