A/B Testing

Controlled randomized experiment comparing two product variants by measuring a specific business metric across randomly assigned user groups to determine which performs better.

Process

Key Fields

Question it answers	Does this change improve the key metric? How confident are we the effect is real and not random?
Participants & timing	Depends on traffic and effect size · typically 1,000-50,000 users · 2-8 weeks running time
AI compatibility	AI runs power analysis, monitors progress toward significance, and generates impact summaries; experiment design requires human statistical judgment.
Output	Test results dashboard, statistical analysis report, user segment performance, business impact projection

Use when

A measurable business metric and sufficient traffic exist
Design direction is unclear and two approaches need comparison
Statistical confidence is required, not just user opinion
An optimization culture is established to iterate on learnings

Skip when

No clear metric to optimize
Traffic is insufficient (under 1,000 users/week)
A major design change is needed (A/B test is for refinement)

Common Mistakes

Peeking at results early

Stopping a test after seeing an early leader invalidates results; effects reverse approximately 30% of the time. Wait for the pre-calculated sample size before declaring a winner.

Optimizing for the wrong metric

"Variant increased clicks by 30%!" but total revenue decreased because users clicked more but bought less. Define the business metric upfront; do not reverse-engineer interpretation.

Ignoring confounding factors

Overall traffic increased during the test period due to a campaign or seasonal lift. Control for confounding variables and run tests during stable periods.