Online Experimentation Diagnosis and Troubleshooting Beyond AA Validation

Online experiments are frequently used at internet companies to evaluate the impact of new designs, features, or code changes on user behavior. Though the experiment design is straightforward in theory, in practice, there are many problems that can complicate the interpretation of results and render...

Full description

Saved in:
Bibliographic Details
Published in:2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) pp. 498 - 507
Main Authors: Zhenyu Zhao, Miao Chen, Matheson, Don, Stone, Maria
Format: Conference Proceeding
Language:English
Published: IEEE 01-10-2016
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Online experiments are frequently used at internet companies to evaluate the impact of new designs, features, or code changes on user behavior. Though the experiment design is straightforward in theory, in practice, there are many problems that can complicate the interpretation of results and render any conclusions about changes in user behavior invalid. Many of these problems are difficult to detect and often go unnoticed. Acknowledging and diagnosing these issues can prevent experiment owners from making decisions based on fundamentally flawed data. When conducting online experiments, data quality assurance is a top priority before attributing the impact to changes in user behavior. While some problems can be detected by running AA tests before introducing the treatment, many problems do not emerge during the AA period, and appear only during the AB period. Prior work on this topic has not addressed troubleshooting during the AB period. In this paper, we present lessons learned from experiments on various internet consumer products at Yahoo, as well as diagnostic and remedy procedures. Most of the examples and troubleshooting procedures presented here are generic to online experimentation at other companies. Some, such as traffic splitting problems and outlier problems have been documented before, but others have not previously been described in the literature.
DOI:10.1109/DSAA.2016.61