User Experience

A/B Testing: Clustered vs Clean Experiment Design

We’re currently designing an A/B test for a client’s product page. It’s the first experiment on the page, and we’ve recommended a number of changes.

The client pointed out that our recommended experiment contains, in effect, several variables. (Or as I called it, a “cluster” of variables.) How then will we know which variable had the greatest impact on the results?

A very good question. And the fact is, we won’t know which variable had the greatest impact.

For a first test, the goal is usually to achieve the greatest possible lift in performance. In most cases, you’re not going to achieve that by making one isolated change (for example the wording of the call to action, or the color of a button.) Usually, there’ll be a combination of elements under review.

How do you decide what to change? Well, it’s a mixture of art and science. In reviewing a page, you consider established best practices and what you’ve learned from past experience, then hypothesize as to how the page could be made to perform better. Just a few things you might consider:

  • Is there a simple, obvious call to action?
  • Why should the user do as you ask? What’s the payoff?
  • Does the page invoke urgency? Why should the user act now?
  • Are there any unnecessary distractions on the page?
  • Is there anything on the page that might undermine its trustworthiness or make the user hesitate?
  • Does the page communicate effectively with all different personality types? (For example, Humanistic, Competitive, Methodical and Spontaneous personalities?)
  • Are there any particular persuasion tactics that could be employed on the page? (For example, Social Proof,  Liking,  AuthorityReciprocity, The Contrast Principle…  For more online persuasion ideas, see this post for a Persuasion Checklist.)

As you can imagine, you can usually spot a whole raft of issues. So on a first test, the redesigns are usually quite dramatic. You’ll have a large cluster of variables to test.

And yes, that means you won’t know which changes had the strongest impact. It’s even possible that some of your changes had a negative impact. From a scientific viewpoint, these experiments aren’t very “clean”.

But that’s what follow-up tests are for. You can’t expect to get all your answers from one test.

As I wrote years ago (in discussing when is it advisable to end a test early) I think we should ask ourselves why we do these experiments. Almost always, it’s for marketing. The goal is to improve performance, not to advance scientific knowledge. “Cleanliness” takes a distant second to enhancing the bottom line.

Author

mstraker

Share
Published by
mstraker

Recent Posts

Amplitude’s AI Agents Are Here, And They’re About to Transform How You Work

Amplitude has officially launched one of the most exciting innovations in digital intelligence: Amplitude AI…

19 hours ago

Sense by Contentsquare: AI That Turns Experience Data Into Action

Digital teams today face an avalanche of user data and complex customer journeys across web…

5 days ago

Building a Data Pipeline in GCP for BigQuery Data Models

In today’s data-driven world, ensuring that data is ingested, transformed, and delivered efficiently is critical…

2 weeks ago

This website uses cookies.