Due to the deluge of both work and extra posts, it’s been a while since my last regex post. I hope to change that this week with the next in our series of regex for beginners posts.
A few weeks ago Michael asked:
Would you be able to give examples of using RegEx to create a brand keywords only segment when the brand contains more than a single word?
Cardinal Path would be a good example to use as people might search on a variety of brand terms such as ‘cardinal path’, ‘cardinalpath’, ‘cardinalpath.com‘ etc.
Is that possible using RegEx?
Absolutely. In fact, it’s pretty easy. The basic thing to keep in mind is the set command (keyword1|keyword2|keyword3). In this post I’ll take you through the basics of filtering out multi-word keywords. This can be done using inline filters, or with advanced segments.
Sometimes you really don’t want to count certain keywords, like when you want to distinguish between branded and non-branded terms.
Not matching a string in regex is simple: ((?!cardinal).)*
For those of you who aren’t familiar with ?!, it essentially means “not followed by”. So this builds a set that contains a set that does not contain cardinal but is followed by any character and all this is repeated 0 or more times
Of course this is not the right way to do it in Google Analytics, which will proceed to throw a hissy fit at you. I mean, Google already has an exclude rule, why are you using a RegEx exclude? (well, because I told you to, of course)
Not a huge deal since Google offers a far superior way of doing negations. Namely the “exclude” rule. You don’t even need RegEx for this, just put the keyword in and choose exclude.
But let’s say that, under some strange situation (like analyzing branded traffic) you actually want to remove more than just that keyword. Like let’s say your top keywords include a misspelling that is bringing in a bunch of traffic (how odd…)
So let’s try this (cardinal|czrdinal). Just set the filter to exclude & regex.
Now just for fun, let’s say that what I am looking for is what kind of topics people are coming to our site for. I don’t want our brand, and I don’t want people looking for Michael Straker or myself. So we could extend this as (cardinal|czrdinal|straker|clark). That takes out a bunch of names, and we could add more (for instance, we would probably want to remove Booth).
Finally, in my case I’m really only interested in what comes to the blog. If we had a normal structure (aka all blog posts fell under /blog/) we could create a segment that includes all pages matching the regex /blog.* and be done with it, but in our case our posts all fall under the root.
Our site only has the following non-blog pages, so it might be easiest just to exclude them. They are:
/ /webinars /contact /for-clients /what-we-do /who-we-are /training
Easy enough to do straight in the advanced segments panel. However, we don’t want that, it’s way too easy. We could use the following regex in theory: /(webinars|contact|for-clients|what-we-do|who-we-are|training)?
A slash followed by any of a series of keywords included 1 or 0 times. Easy, no?
No, actually. That won’t work. The reason is simple: it will match anything that has a /. Ooops. Easy fix.
There we go. Match any page that starts with a slash, followed by one of those, and then ends. In theory you could leave out the caret, but I think the added specificity will likely work in our favour.
Also a note, make sure, when setting up an advanced segment like this, to test them. Go through your pages and make sure there’s no misspellings or odd query strings that are letting false positives through.
Hopefully this gives you an idea of how useful RegEx can be when dealing with Google Analytics.
edit: Just a shout out to an amazing regex tool that I use when writing these posts: regexpal.