User Experience

A/B Testing: Clustered vs Clean Experiment Design

We’re currently designing an A/B test for a client’s product page. It’s the first experiment on the page, and we’ve recommended a number of changes.

The client pointed out that our recommended experiment contains, in effect, several variables. (Or as I called it, a “cluster” of variables.) How then will we know which variable had the greatest impact on the results?

A very good question. And the fact is, we won’t know which variable had the greatest impact.

For a first test, the goal is usually to achieve the greatest possible lift in performance. In most cases, you’re not going to achieve that by making one isolated change (for example the wording of the call to action, or the color of a button.) Usually, there’ll be a combination of elements under review.

How do you decide what to change? Well, it’s a mixture of art and science. In reviewing a page, you consider established best practices and what you’ve learned from past experience, then hypothesize as to how the page could be made to perform better. Just a few things you might consider:

  • Is there a simple, obvious call to action?
  • Why should the user do as you ask? What’s the payoff?
  • Does the page invoke urgency? Why should the user act now?
  • Are there any unnecessary distractions on the page?
  • Is there anything on the page that might undermine its trustworthiness or make the user hesitate?
  • Does the page communicate effectively with all different personality types? (For example, Humanistic, Competitive, Methodical and Spontaneous personalities?)
  • Are there any particular persuasion tactics that could be employed on the page? (For example, Social Proof,  Liking,  AuthorityReciprocity, The Contrast Principle…  For more online persuasion ideas, see this post for a Persuasion Checklist.)

As you can imagine, you can usually spot a whole raft of issues. So on a first test, the redesigns are usually quite dramatic. You’ll have a large cluster of variables to test.

And yes, that means you won’t know which changes had the strongest impact. It’s even possible that some of your changes had a negative impact. From a scientific viewpoint, these experiments aren’t very “clean”.

But that’s what follow-up tests are for. You can’t expect to get all your answers from one test.

As I wrote years ago (in discussing when is it advisable to end a test early) I think we should ask ourselves why we do these experiments. Almost always, it’s for marketing. The goal is to improve performance, not to advance scientific knowledge. “Cleanliness” takes a distant second to enhancing the bottom line.

Author

mstraker

Share
Published by
mstraker

Recent Posts

Unlock Growth with Amplitude: Feature Flags & Session Replay

Amplitude is one of the most powerful tools available for understanding and improving the customer…

5 days ago

The Power of AI in DV360: Maximizing Performance and Efficiency

AI features in paid media are continuing to become the norm; and leveraging AI features…

2 weeks ago

Beyond the Dashboard: Why Your Google Analytics Isn’t Driving Real ROI

Are you questioning why your Google Analytics data isn't driving a noticeable return on investment?…

3 weeks ago

This website uses cookies.