A/B testing or split testing is a powerful tool that can help you improve your core business metrics. Experimenting is at the core of many of today’s most successful companies. Creative organizations like Amazon, Booking or Netflix make management decisions based on data and experiments. Those companies appreciate the value of small changes and the impact they can have.
In this blog article, we will answer the following questions:
- What is A/B Testing?
- Why should you A/B test?
- What can be tested?
- What should you A/B test, and what not?
- How to define your success metrics?
- How long should an A/B test run?
- How to set up an A/B test?
- How to analyze your test results?
What is A/B Testing?
An experiment where you simultaneously show two or more versions of a variation to different segments of users to determine which version has more impact on a business metrics.
Let’s walk you through an example: You want to test the main headline on your page to find out if more people click the call to action button. By changing the headline, you are expecting an improvement of your Click-Through-Rate from 2,3% to 3,1% (this is a hypothesis). 50% of your visitors will see the original headline (the control), and the other 50% – a different variation of your headline (the treatment). You let the test run until you have enough visitors on your page to determine which variation works better. The winning headline will then become the new champion.
Elements of a test:
|Hypothesis||This is a testable proposition. What impact (expressed as a measurable metric) are you expecting with this change?|
|Success Metric||Metric that best measures the expected impact.|
|A:||The control or champion.|
|B:||The treatment or also called variation.|
|A/B test||A test with one treatment.|
|A/B/C A/B/n or multivariate test||A test with more than one treatment.|
Why should you A/B test?
A/B testing allows businesses, teams and individuals to make decisions based on data instead of relying on gut feeling. You can develop a hypothesis based on data, insights or observation, which can be proven right or wrong.
Success is the ability to go from failure to failure without losing enthusiasm– Winston Churchill
You can test changes to your user experience, business model or product and measure the result. It helps you continuously improve the usability of your interface and your core business metrics.
It can also be a great way of de-risking new features. Instead of releasing it to your entire customer base, you can first test which effect it has on users and core metrics.
A/B tests can also help you decide how much you can invest in something. A great example comes from the search engine Bing. They knew that when searching for information, speed is essential to users. However, how critical is it? What’s the exact impact? How fast do search results need to display? They ran an A/B test with artificial delays and found that a 100-millisecond difference in performance led to a 0.6% impact on revenue. That’s 18M in annual earnings. A clear investment case in speed!
What can be tested?
Nearly everything can be A/B tested. You can run A/B tests to optimize customer experiences, software changes, and business models.
- All user interfaces (web pages, app screens, etc.)
- Page elements like images, headlines, buttons, etc
- Storytelling & copy
- New features
- Business models
- Back-end such as algorithm changes
- Newsletters: subject lines, content, sender, etc.
- Ads: targeting, creatives and bidding
- and much more
What should you A/B test, and what not?
Let’s A/B test the button colour! 🟦 🟩 I often hear that, but I’ve rarely seen an impact. Yes, you can test everything that comes to mind. However, what is worth the A/B test?
Suppose you don’t have thousands of visitors on your site and are not even sure about your product market fit. In that case, testing the button colour is probably not a good idea.
Don’t get lost in testing trivial changes, but rather ask yourself 𝑎𝑟𝑒 𝑤𝑒 𝑢𝑠𝑖𝑛𝑔 𝐴/𝐵 𝑡𝑒𝑠𝑡𝑠 𝑡𝑜 𝑜𝑝𝑡𝑖𝑚𝑖𝑧𝑒 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒𝑠, 𝑠𝑜𝑓𝑡𝑤𝑎𝑟𝑒 𝑐ℎ𝑎𝑛𝑔𝑒𝑠 𝑜𝑟 𝑏𝑢𝑠𝑖𝑛𝑒𝑠𝑠 𝑚𝑜𝑑𝑒𝑙𝑠? Do we have a strong hypothesis behind that test that will impact our business metrics?
But what is a strong hypothesis? That is a bit more complex one to answer! Ask yourself: What’s the driver behind that change that should impact your business metrics?
Doing an A/B test because you can’t decide if you like blue or green more is not a critical hypothesis. On the other hand, if you assume that users are not taking the implied action because the button is not recognizable enough, you can justify the test.
Let’s go one step further. Let’s say you have a brand that wants to be perceived as affordable, but you only use sophisticated, luxurious colours (black, purple, etc.). Running an A/B Test where you use yellow and red (colours signalizing promotions and associated with inexpensive products) to test your brand’s perception by users is a far stronger hypothesis.
Of course, minor changes can have a significant impact and should be appreciated. If you reach a particular stage where you have an established and well-running product or business, you should test as much as possible. But when you are at the beginning and have limited traffic and conversions, you should first try to get answers to your most important questions.
How to define your success metrics
It is crucial to define success metrics for your experiment. Otherwise, you can’t measure the impact. Your success metric could be revenue, conversion rate, usage rates, CTR, time spent, etc.
Some questions to ask yourself:
- What metric measures the impact?
- How rocky or stable is this metric?
- Is it a short-term or long-term metric? Are short-term metrics good predictors for long-term outcomes?
How long should an A/B test run?
Well, there is no one-fits-all answer. In general: The more significant the expected impact, the fewer people you need in your sample size.
How long it needs to run depends on your number of visitors and expected conversions. Thankfully there are lots of calculators out there to help you answer this question.
If you don’t have enough users or conversions on your page to get to any learnings in a reasonable timeframe, think of other ways to optimize your customer experience. You can do a heuristic analysis, get an expert opinion, or do other user tests to improve your performance.
How to set up an A/B test?
There are two options: build your own experiment setup or use an existing tool. Building your own infrastructure gives you complete control. It can make sense once you reach a certain scope or number of experiments you run. To get started, I would recommend you one of these tools:
To test your setup, you can run an A/A test to make sure your test is distributed correctly.
How to analyze your test results?
Have you run your first experiment? Well done! But is the result significant or not?
With a statistically significant result, you can feel confident that you weren’t just lucky with the sample. Instead that the difference is with a high probability real. To ensure you avoid a Type 1 Error (Probability of finding a relationship where there is none), you determine the p-value. The p-value helps you determine if these changes are out of the ordinary or if they are just typical noise.
Good news! You don’t need to dig into statistics but can use one of the many online calculators. I especially like this one since it gives you more insights.
You can also see probabilities in e.g. Google Optimize directly:
Pro Tip for Google Optimize: If you want to dig deeper into how the different variations’ users behave on your web page, you can create segments in Google Analytics with one click.
Those are the basics – if you have questions or need help getting started on A/B testing, feel free to get in touch!