A/B Testing

Controlled randomized experiment comparing two product variants by measuring a specific business metric across randomly assigned user groups to determine which performs better.


Process

Key Fields

Question it answersDoes this change improve the key metric? How confident are we the effect is real and not random?
Participants & timingDepends on traffic and effect size · typically 1,000-50,000 users · 2-8 weeks running time
AI compatibilityAI runs power analysis, monitors progress toward significance, and generates impact summaries; experiment design requires human statistical judgment.
OutputTest results dashboard, statistical analysis report, user segment performance, business impact projection
Use when
Skip when

Common Mistakes

Peeking at results early

Stopping a test after seeing an early leader invalidates results; effects reverse approximately 30% of the time. Wait for the pre-calculated sample size before declaring a winner.

Optimizing for the wrong metric

"Variant increased clicks by 30%!" but total revenue decreased because users clicked more but bought less. Define the business metric upfront; do not reverse-engineer interpretation.

Ignoring confounding factors

Overall traffic increased during the test period due to a campaign or seasonal lift. Control for confounding variables and run tests during stable periods.