Posted on Jun 16, 2015 by Jeff Zych

“Did You A/B Test It?”

After launching a feature, coworkers often ask me, “Did you A/B test it?” While the question is well-meaning, A/B testing isn’t the only way, or even the best way, of making data-informed decisions in product development. In this post, I’ll explain why, and provide other ways of validating hypotheses to assure your coworkers that a feature was worth building.

Implied Development Process

My coworker’s simple question implies a development process that looks like this:

You have an idea for a new feature
You build the new feature
You A/B test it to prove its success
Profit! High fives! Release party!

While this looks reasonable on the surface, it has a few flaws.

Flaw 1: What metric are you measuring?

The A/B test in step 3 implies that you’re comparing a version of the product with the new feature to a version without the new feature. But a key part of running an A/B test is choosing a metric to call the winner, which is where things get tricky. Your instinct is probably to measure usage of the new feature. But this doesn’t work because the control lacks the feature, so it loses before the test even begins.

There are, however, higher-level metrics you care about. These could range from broad business metrics, like revenue or time in product, to more narrow metrics, like completing a specific task (such as successfully booking a place to stay in the case of AirBnB). Generally speaking, broader metrics are slower to move and influenced by more factors, so narrow metrics are better.

Even so, this type of experiment isn’t what A/B tests excels at. At its core, A/B testing is a hill climbing technique. This means it’s good at telling you if small, incremental changes are an improvement (in other words, each test is a step up a hill). Launching a feature is more like exploring a new hill. You’re giving users the ability to do something they couldn’t do before. A/B testing isn’t good at comparing hills to each other, nor will it help you find new hills.

Flaw 2: What if the new feature loses?

Let’s say you have good metrics to measure, and enough traffic to run the test in a reasonable timeframe. But the results come back, and the unthinkable happened: your new feature lost. There’s no profit, high fives, or launch party. Now what do you do?

Because of sunk costs, your instinct is going to be to try to improve the feature until it wins. But an A/B test doesn’t tell you why it lost. Maybe there was a minor usability problem, or maybe it’s fundamentally flawed. Whatever the problem may be, an A/B test won’t tell you what it is, which doesn’t help you improve it.

The worst-case scenario is that the feature doesn’t solve a real problem, in which case you should remove it. But this is an expensive option because you spent the time to design, build, and launch it before learning it wasn’t worth building. Ideally you’d discover this earlier.

Revised Development Process

When our well-meaning coworker asked if we A/B tested the new feature, what they really wanted to know is if we have data to back up that it was worth building. To them, an A/B test is the only way they know how to answer that question. But as user experience professionals, we know there are plenty of methods for gathering data to guide our designs. Let’s revise our product development process from above:

You have an idea for a new feature.
You scope the problem the feature is supposed to solve by interviewing users, sending out surveys, analyzing product usage, or using other research methods.
You create prototypes and show them to users.
You refine the design based on user feedback.
You repeat steps 3 and 4 until you’re confident the design solves the problem you set out to solve.
You build the feature.
You do user testing to find and fix usability flaws.
You release the feature via a phased rollout (or a private/public/opt-in beta) and measure your key metrics to make sure they’re within normal parameters.
- This can be run as an A/B test, but doesn’t need to be.
Once you’re confident the feature is working as expected, fully launch it to everyone.
Profit! High fives! Release party!
Optimize the feature by A/B testing incremental changes.

In this revised development process (commonly called user-centered design), you’re gathering data every step of the way. Rather than building a feature and “validating” it at the end with an A/B test, you’re continually refining what you’re building based on user feedback. By the time you release it, you’ve iterated countless times and are confident it’s solving a real problem. And once it’s built, you can use A/B testing to do what A/B testing does best — optimization.

A longer process? Yes. A more confident, higher quality launch? Also yes.

Now when your coworkers ask if you A/B tested your feature, you can reply, “No, but we made data-informed decisions that told us users really want this feature. Let me show you all of our data!” By using research and A/B testing appropriately, you’ll build features that your users and your bottom line will love.

Jeff Zych

“Did You A/B Test It?”

Implied Development Process

Flaw 1: What metric are you measuring?

Flaw 2: What if the new feature loses?

Revised Development Process

Further Reading