Latest Tweets

Laura Klein

A/B and Qualitative User Testing

Recently, I worked with a company devoted to A/B testing. For those of you who aren’t familiar with the practice, A/B testing (sometimes called bucket testing or multivariate testing) is the practice of creating multiple versions of a screen or feature and showing each version to a different set of users in production in order to find out which version produces better metrics. These metrics may include things like “which version of a new feature makes the company more money” or “which landing screen positively affects conversion.” Overall, the goal of A/B testing is to allow you to make better product decisions based on the things that are important to your business by using statistically significant data.

Qualitative user testing, on the other hand, involves showing a product or prototype to a small number of people while observing and interviewing them. It produces a different sort of information, but the goal is still to help you make better product decisions based on user feedback.

Now, a big part of my job involves talking to users about products in qualitative tests, so you might imagine that I would hate A/B testing. After all, wouldn’t something like that put somebody like me out of a job? Absolutely not! I love A/B testing. It’s a phenomenal tool for making decisions about products. It is not the only tool, however. In fact, qualitative user research combined with A/B testing creates the most powerful system for informing design that I have ever seen. If you’re not doing it yet, you probably should be.

A/B Testing

What It Does Well

A/B testing on its own is fantastic for certain things. It can help you:

  • Get statistically significant data on whether a proposed new feature or change significantly increases metrics that matter – numbers like revenue, retention, and customer acquisition
  • Understand more about what your customers are actually doing on your site
  • Make decisions about which features to cut and which to improve
  • Validate design decisions
  • See which small changes have surprisingly large effects on metrics
  • Get user feedback without actually interacting with users

For example, imagine that you are creating a new check out flow for your website. There is a request from your marketing department to include an extra screen that asks users for some demographic information. However, you feel that every additional step in a check out process represents a chance for users to drop out, which prevents purchases. By creating two flows in production, one with the extra screen and one without, and showing each flow to only half of your users, you can gather real data on how many purchases are completed by members of each group. This allows you to understand the exact impact on sales and helps you decide whether gathering the demographic information is really worth the cost.

Even more appealing, you can get all this user feedback without ever talking to a single user. A/B testing is, by its nature, an engineering solution to a product design problem, which makes it very popular with small, engineering-driven startups. Once the various versions of the feature are released to users, almost anybody can look at the results and understand which option is doing better, so it can all be done without having to recruit or interview test participants.

Of course, A/B testing in production works best on things like web or mobile applications where you can not only show different interfaces to different customers, but where you can also easily switch all of your users to the winning interface without having to ship them a new box full of software or a new physical device. I wouldn’t recommend trying it if you’re designing, for example, a car.

What It Does Poorly

Now imagine that, instead of adding a single screen to an already existing check out flow, you are tasked with designing an entirely new check out flow that should maximize revenue and minimize the number of people who abandon their shopping carts. In creating the new flow, there are hundreds of design decisions you need to make, both small and large. How many screens should it have? How much up-selling and cross-selling should you do? At what point in the flow do you ask users for payment information? What should the screens look like? Should they have the standard header and footer, or should those be removed to minimize potential distractions for users when purchasing? And on and on and on…

These are all just a series of small decisions, so, in an ideal world, you’d be able to A/B test each one separately, right? Of course, in the real world, this could mean creating an A/B test with hundreds of different variations, each of which has to be shown to enough users to achieve statistical significance. Since you want to roll out your new check out process sometime before the next century, this may not be a particularly appealing option.

A Bad Solution

Another option would be to fully implement several very different directions for the check out screens and test them all against one another. For example, let’s say you implemented four different check out processes with the following features to test against one another:

Option 1: Option 2: Option 3: Option 4:
  • Yellow Background
  • Three Screens
  • Marketing Questions
  • No Up-selling
  • No Cross-Selling
  • Header
  • No Footer
  • Help Link
  • Blue Background
  • Two Screens
  • No Marketing Questions
  • Up-selling
  • No Cross-Selling
  • Header
  • Footer
  • No Help
  • Orange Background
  • Four Screens
  • Marketing Questions
  • Up-selling
  • Cross-Selling
  • No Header
  • Footer
  • Live Chat Help
  • White Background
  • One Screen
  • No Marketing Questions
  • No Up-selling
  • Cross-Selling
  • No Header
  • No Footer
  • Live Chat Help

This might work in companies that have lots of bored engineers sitting around waiting to implement and test several different versions of the same code, most of which will eventually be thrown away. Frankly, I haven’t run across a lot of those companies. But even if you did decide to devote the resources to building four different check out flows, the big problem is that, if you get a clear winner, you really don’t have very clear idea of WHY users preferred a particular version of the check out flow over the others. Sure, you can make educated guesses. Perhaps it was the particularly soothing shade of blue. Or maybe it was the fact that there weren’t any marketing questions. Or maybe it was aggressive up-selling. Or maybe that version just had the fewest bugs.

But the fact is, unless you figure out exactly which parts users actually liked and which they didn’t like, it’s impossible to know that you’re really maximizing your revenue. It’s also impossible to use those data to improve other parts of your site. After all, what if people HATE the soothing shade of blue, but they like everything else about the new check out process? Think of all the money you’ll lose by not going with the yellow or orange or white. Think of all the time you’ll waste by making everything else on your site that particular shade of blue, since you think that you’ve statistically proven that people love it!

What Qualitative Testing Does Well

Despite the many wonderful things about A/B testing, there are a few things that qualitative testing just does better.

Find the Best of All Worlds

Qualitative testing allows you to test wildly different versions of a feature against one another and understand what works best about each of them, thereby helping you develop a solution that has the best parts from all the different options. This is especially useful when designing complicated features that require many individual decisions, any one of which might have a significant impact on metrics. By observing users interacting with the different versions, you can begin to understand the pros and cons of each small piece of the design without having to run each one individually in its own A/B test.

Find Out WHY Users Are Leaving

While a good A/B test (or plain old analytics) can tell you which page a user is on when they abandon a check out flow, it can’t tell you why they left. Did they get confused? Bored? Stuck? Distracted? Information like that helps you make better decisions about what exactly it is on the page that is causing people to leave, and watching people using your feature is the best way to to gather that information.

Save Engineering Time and Iterate Faster

Generally, qualitative tests are run with rich, interactive wireframes rather than fully designed and tested code. This means that, instead of having your engineers code and test four different versions of the flow, you can have a designer create four different HTML prototypes in a fraction of the time. HTML prototypes are significantly faster to produce since:

  • They don’t have to run in multiple browsers, just the one you’re testing
  • They don’t have any backend code that needs to be done
  • They frequently don’t have a polished visual design (unless that’s part of what you’re testing)

And since making changes to a prototype doesn’t require any engineering or QA time, you can iterate on the design much faster, allowing you to refine the design in hours or days rather than weeks or months.

How Do They Work Together?

Qualitative Testing Narrows Down What You Need to A/B Test

Qualitative testing will let you eliminate the obviously confusing stuff, confirm the obviously good stuff, and narrow down the set of features you want to A/B test to a more manageable size. There will still be questions that are best answered by statistics, but there will be a lot fewer of them.

Qualitative Testing Generates New Ideas for Features and Designs

While A/B testing helps you eliminate features or designs that clearly aren’t working, it can’t give you new ideas. Users can. If every user you interview gets stuck in the same place, you’ve identified a new problem to solve. If users are unenthusiastic about a particular feature, you can explore what’s missing with them and let them suggest ways to make the product more engaging.

Talking to your users allows you to create a hypothesis that you can then validate with an A/B test. For example, maybe all of the users you interviewed about your check out flow got stuck selecting a shipment method. To address this, you might come up with ideas for a couple of new shipment flows that you can test in production once you’ve confirmed that they’re less confusing with another quick qualitative test.

A/B Testing Creates a Feedback Loop for Researchers

A/B tests can also improve your qualitative testing process by providing statistical feedback to your researchers. I, as a researcher, am going to observe participants during tests in order to see what they like and dislike. I’m then going to make some educated guesses about how to improve the product based on my observations. When I get feedback about which¬† recommendations are the most successful, it helps me learn more about what’s important to users so I make better recommendations in the future.

Any Final Words?

Separately, both A/B testing and qualitative testing are great ways to learn more about your users and how they interact with your product. Combined, they are more than the sum of their parts. They form an incredibly powerful tool that can help you make good, user-centered product decisions more quickly and with more confidence than you have ever imagined.

Share with

12 Comments

  1. We’re in the process of starting our test bed and plan on utilizing it in a very similar way to what you’ve described. For our qualitative tests, we’ve been using UserTesting.com. What’s nice about their service, is that their users respond very quick, which pairs nicely with an iterative/agile environment.

    Thx for sharing your wisdom.

    -isaacw

    posted by Isaac Weinhausen at 1:22 pm on 09.08.09
  2. Laura-

    I am always on the prowl for more practical tips on A/B and multivariate testing for my web properties. Kudos to you for teaching me something new tonight by weaving qualitative testing in there. Thanks!

    FYI- I have emailed the article URL to myself for future reference in my testing_metrics email folder.

    Can you tell me what, if anything, has suprised you when you combined qualitative testing with A/B testing?

    posted by Dan Hodgins at 10:54 pm on 09.08.09
  3. Really good post.

    There’s a culture in web that testing can be left out – it’s an addition to a project if there is enough time and money.

    This is really the wrong approach – you need to be testing right from the project outset. By testing early and testing often you’ll save money in the long run.

    Rob — http://testled.com

    posted by Rob Edwards at 5:04 am on 09.09.09
  4. Thanks all for the comments!

    Dan, you asked what surprised me when I combined qualitative with A/B testing. It was honestly surprising how well they went together. I had heard from so many engineers “oh, we don’t need to talk to users. We have metrics!” that I started to believe them.

    But of course I talked to users anyway, because that’s what I do, and I started to approach the engineers saying, “Hey, we’re hearing that this thing is a problem. Let’s change it in this particular way and use the metrics to find out if we were right.” It pretty quickly got to the point where we could predict which metrics (registration? revenue? retention?) were going to move significantly when we made a change based on user feedback.

    I think there is still resistance on both sides of the qual vs quant (or sometimes designers vs engineers) debate, but I have have been absolutely convinced that an integrated approach works brilliantly for producing a better product.

    Laura

    posted by Laura Klein at 2:58 pm on 09.09.09
  5. Social comments and analytics for this post…

    This post was mentioned on Twitter by ericries: Learn about hypothesis generation vs validation: “A/B and Qualitative User Testing” http://bit.ly/5Em5UO

    posted by uberVU - social comments at 1:37 pm on 12.03.09
  6. Really good article, Laura! I try to argue for combining these things as well. Something else to consider is using multivariate testing, which does allow you to test each combination of different design elements together, but it is harder to get statistical significance with it.

    Sarah

    posted by Sarah DeAtley at 1:49 pm on 12.03.09
  7. enjoyed the post — thanks for taking the time to write out your thoughts

    posted by Giff at 5:04 pm on 12.03.09
  8. [...] A/B and Qualitative User Testing (HN discussion) [...]

  9. Interesting information, thanks for posting!

    posted by Steve at 3:14 pm on 02.19.11
  10. [...] [...]

  11. [...] firm slicedbread on when to use A/B testing vs. qualitativeuser [...]

  12. Great article. As you say the combining of qual and quant makes sense and I’m hoping to trial and prove just that in my current role. We’re very much a numbers organisation at the moment so I’ve got my work cut out – luckily I like a challenge! The qualitative research I do in my role at the moment is not held in great esteem by the business due to the small sample sizes (although it is by agile team members who experience it first hand). I’m striving to get more involved in the analytics we collect and analyse as well as the a/b testing we’ve begun to do in order to figure out how best to use the different data we get and how we can incorporate qualitative data as part of the norm. This article will definitely help me in trying to achieve this – so I’ll definitely be bookmarking and coming back! Thanks… :)

    posted by Bex Tindle at 12:55 pm on 05.08.14

Leave a comment