Controlled randomized experiment comparing two product variants by measuring a specific business metric across randomly assigned user groups to determine which performs better.
| Question it answers | Does this change improve the key metric? How confident are we the effect is real and not random? |
|---|---|
| Participants & timing | Depends on traffic and effect size · typically 1,000-50,000 users · 2-8 weeks running time |
| AI compatibility | AI runs power analysis, monitors progress toward significance, and generates impact summaries; experiment design requires human statistical judgment. |
| Output | Test results dashboard, statistical analysis report, user segment performance, business impact projection |
Stopping a test after seeing an early leader invalidates results; effects reverse approximately 30% of the time. Wait for the pre-calculated sample size before declaring a winner.
"Variant increased clicks by 30%!" but total revenue decreased because users clicked more but bought less. Define the business metric upfront; do not reverse-engineer interpretation.
Overall traffic increased during the test period due to a campaign or seasonal lift. Control for confounding variables and run tests during stable periods.