A/B Testing: Clustered vs Clean Experiment Design

We’re currently designing an A/B test for a client’s product page. It’s the first experiment on the page, and we’ve recommended a number of changes.

The client pointed out that our recommended experiment contains, in effect, several variables. (Or as I called it, a “cluster” of variables.) How then will we know which variable had the greatest impact on the results?

A very good question. And the fact is, we won’t know which variable had the greatest impact.

For a first test, the goal is usually to achieve the greatest possible lift in performance. In most cases, you’re not going to achieve that by making one isolated change (for example the wording of the call to action, or the color of a button.) Usually, there’ll be a combination of elements under review.

How do you decide what to change? Well, it’s a mixture of art and science. In reviewing a page, you consider established best practices and what you’ve learned from past experience, then hypothesize as to how the page could be made to perform better. Just a few things you might consider:

  • Is there a simple, obvious call to action?
  • Why should the user do as you ask? What’s the payoff?
  • Does the page invoke urgency? Why should the user act now?
  • Are there any unnecessary distractions on the page?
  • Is there anything on the page that might undermine its trustworthiness or make the user hesitate?
  • Does the page communicate effectively with all different personality types? (For example, Humanistic, Competitive, Methodical and Spontaneous personalities?)
  • Are there any particular persuasion tactics that could be employed on the page? (For example, Social Proof,  Liking,  AuthorityReciprocity, The Contrast Principle…  For more online persuasion ideas, see this post for a Persuasion Checklist.)

As you can imagine, you can usually spot a whole raft of issues. So on a first test, the redesigns are usually quite dramatic. You’ll have a large cluster of variables to test.

And yes, that means you won’t know which changes had the strongest impact. It’s even possible that some of your changes had a negative impact. From a scientific viewpoint, these experiments aren’t very “clean”.

But that’s what follow-up tests are for. You can’t expect to get all your answers from one test.

As I wrote years ago (in discussing when is it advisable to end a test early) I think we should ask ourselves why we do these experiments. Almost always, it’s for marketing. The goal is to improve performance, not to advance scientific knowledge. “Cleanliness” takes a distant second to enhancing the bottom line.

Michael Straker

Share
Published by
Michael Straker

Recent Posts

Google Delays Third-Party Cookie Deprecation to 2025

Google announced on April 23 that it will again delay third-party cookie deprecation (3PCD) in…

7 days ago

Understanding Funnel Reports in GA4

Funnel reports have long been one of the most actionable reports in a marketing analyst’s…

1 week ago

GA4 Monetization Reports: An Overview

GA4’s Monetization reports provide organizations with simple but actionable views into the revenue-generating aspects of…

2 weeks ago

This website uses cookies.