Bayesian A/B testing — a practical primer | by Maximilian Speicher | Jun, 2025

Published on:

Congratulations! The results of your A/B test came back. The analytics team gave you the usual frequentist analysis of the collected data. Based on this, your new countdown timer in the shopping cart yielded an uplift of 1.8%p at a p-value of 0.012. The 95% confidence interval for the uplift is [0.4%p, 3.2%p].

Therefore, as the conclusion of your report, you write that there is a 98.8% chance of 1.8%p more conversions, and a 95% chance that the true uplift will lie between 0.4%p and 3.2%p. Pretty sweet result.¹

A/B testing has become the gold standard for data-driven decision making, but there’s a persistent gap between what practitioners want to know and what commonly applied statistical methods — i.e., frequentist statistics — can tell them. “Frequentist,” very roughly speaking, means calculating the probability of data given a (null) hypothesis. In a previous ACM Interactions article on statistical significance in A/B testing (Speicher, 2022), I highlighted how p-values and significance testing are often misunderstood and misapplied in practice.

Because the conclusion you drew above from what the analytics team gave you is completely wrong. 🤯

“Great! We have a 98.8% chance of achieving 1.8%p uplift” is a very natural reaction. And it is an…

Source link

Related