quick results aren't always good results
On March 9th, 2009 we launched an A/B/n conversion rate optimization experiment, where we matched the original test page against 5 other variations. Within 24 hours Google Website Optimizer had declared a winner that had a 98% chance of beating the original with a 116% observed conversion rate improvement. We were very excited about the results, and the speed with which we were able to achieve them.
Our excitement was tempered quickly as we know that getting these results in one day with the sample size that had been exposed to the test is not ideal for declaring a winner. Even though there is a 98% chance that Combination #1 will beat original, there is a 2% chance that it won't, and a chance that a different test combination will win.
So before you get too excited with your conversion rate optimization tests and implement the winning combinations that Google's Website Optimizer has selected keep these items in mind.
- How much traffic has each combination received?
- How many days has the test been running?
- What cyclical effects does your site experience? By this I mean: how does behavior change during business hours vs. after business hours? How does behavior vary by day of the week? How does behavior vary around the 1st, and the 15th of the month (wage cycles)?
In my opinion it is best to run your tests for at least one week, and in many cases we recommend one month (this would be because the test sample size is not large). This is interesting because it involves subjectivity when what we are trying to do with this conversion rate optimization tests it to be empirical. The testing is definitely empirical, but you have to account for outside influences that you are aware of that the testing platform isn't.
Now, one of the tactics that you can deploy to help increase the statistical confidence of your test is to prune out combinations that aren't performing well, and aren't likely to win. This will help drive more traffic to the remaining combinations (and the original) so that you can more quickly increase your statistical confidence.
Google will tell you that it has found “High-confidence winners” when combination have a 98% or higher likelihood of beating the original. This can be perceived as a clue as to which combinations should continue in the test and which should be pruned. You will want to be careful with this as their maybe other combinations that are performing well, but haven't made the cut that you should leave in the test. This would have been the case for us after 24 hours. After 24 hours we had one combination that had met Google's hurdle rate, but after 36 hours we had two combinations that had met the hurdle rate. (One thing to remember about pruning, you can't remove the original from the test, and it will often one of the combinations that is performing very poorly.)
In our case we pruned the test after 36 hours. After pruning the test includes the two combinations that have a 98% or higher likelihood to beat the original and the original. The next best performing test combination (that did get pruned) had an observed improvement of 30.5% and a 72.2% chance to beat the original. We decided to prune this combination as even though it was likely to beat the original, it wasn't likely to beat the other two combinations that remain in the test.
So now that we have pruned this Google Website Optimized test we will let it run for at least another 5.5 days and see what happens (a one week test). Our expectation is that this conversion rate optimization experiment will lead to at least a 100% improvement over the original with at least a 98% chance to beat the original.