Practical solution: extracting a list of Query Strings from URLs

When working on complex websites, we often end up with tons of URLs that are, in reality, the same page with different query-string arguments.

Why is this important?

  1. Filter out arguments that are creating unnecessary duplicates
  2. Reduce the number of distinct entries in reports and the risks of breaking the number of uniques imposed by most tools
  3. Increase the accuracy of Page Views
  4. Identify rogue query string parameters and wrong arguments (as well as those with mixed upper/lowercase which are likely also considered as distinct ones)

The solution

I love Analytics Canvas, so I thought – why not try to do a Query String analysis using it! The original idea comes from Peter O’Neill, who offers an Excel spreadsheet with macros to do this.

The algorithm looks like this:

  • Fetch data from GA: only paths with URLs, sorted by descending unique page views
  • Split the URL to get only the Query String part
  • Further split the query string to extract value-pairs (up to 5 arguments in this case) and put them in a new column named “arg”
  • Combine everything back but keep only the “arg” column
  • Filter out rows where “arg” is empty (i.e. there were less than 5 arguments)
  • Split out the “arg” column into “var=value” pairs and keep only the “var” column
  • Combine duplicate “var” and create a new column with a total of occurences of each one
  • Sort in alphabetical order
  • Store into Excel
  • Voilà!

 

It worked on a pretty complex website which had nearly 30k URLs with query string parameters and ended up with 195 distinct arguments!

Here’s how the results looks like:

Parameter Count
transit 9960
page 1463
sLang 1454

Once equipped with this information, we can adequately configure Google Analytics by filtering out unnecessary parameters and improve the overall quality of our analytics.

Stéphane Hamel

Director of Strategic Services - Create innovative company-wide solutions and digital measurement strategies to deliver maximum value and insight for Cardinal Path clients.

Share
Published by
Stéphane Hamel

Recent Posts

Google Delays Third-Party Cookie Deprecation to 2025

Google announced on April 23 that it will again delay third-party cookie deprecation (3PCD) in…

4 days ago

Understanding Funnel Reports in GA4

Funnel reports have long been one of the most actionable reports in a marketing analyst’s…

6 days ago

GA4 Monetization Reports: An Overview

GA4’s Monetization reports provide organizations with simple but actionable views into the revenue-generating aspects of…

2 weeks ago

This website uses cookies.