You can access a significantly larger sample of the platform's content for free by logging in with your Gmail account. Sign in now to explore.

Why is high variance an issue in A/B testing? What are some things that we can do to reduce variance?

“Variance is the core of experimental analysis.” Almost all the key statistical concepts around experimentation relate to variance. Assuming i.i.d. samples, we can estimate the variance as follows:

\[ \begin{align} \bar{Y}&= \frac{1}{n} \sum_i Y_i \\ var(Y) &= \hat\sigma^2 = \frac{1}{n-1} \sum_i (Y_i - \bar{Y})^2 \\ var(\bar{Y}) &= var(\frac{1}{n} \sum_i Y_i) = \frac{1}{n^2} * n * var(Y) = \frac{\hat\sigma^2}{n} \end{align} \]

If we overestimate the variance, it is more likely to get false negatives; if we underestimate the variance then it is more likely to get false positives. To understand this, consider the following two-sample test:

\[ T = \frac{\Delta}{\sqrt{Var(\Delta)}} \] Overestimating variance increases the denominator, which then decreases the estimated T score, which will lead to False negatives as we will mistakenly not reject the Null. On the other hand, if we underestimate variance, we will end up rejecting the null more often as the above denominator will be smaller.

High variance in particular is an issue because it affects power analysis and increases the necessary sample size of the experiment. In fact, assuming that significance level \(\alpha=0.05\), power can be defined by \(\delta\), the **minimum delta of practical significance**:

\[ \text{Power}_{\delta} = P(|T| \geq 1.96 | \text{true diff is } \delta) \] where \(T\) is the t-statistic value. Then, assuming treatment and control are of equal size, the total number of samples you need to achieve 80% power is (p.189 Kohavi, Tang, and Xu (2020)):

\[ n \approx \frac{16 \sigma^2} {\delta^2} \] As a result, the sample size increases with variance (and decreases with \(\delta^2\)). Because of the effect of variance on sample size, we tend to try to artificially reduce variance, with some of the following techniques:

- Remove outliers (e.g., bots)
- Cap or log-transform variable of interest
- Use post-stratification to reduce variance within strata (or do a stratified experiment)
- Control variables in a regression
- Use CUPED (https://www.statsig.com/blog/cuped)
- Randomize at a more granular level
- Run a within subject design A/B test (https://dovetail.com/research/within-subjects-design/)

Variance

- Normality assumption Medium (Normality)
- False discovery control Easy (False discovery rate, Multiple hypotheses testing, Benjamini & Hochberg, Bonferroni)
- Equal-sized treatment and control groups Medium (Power, Variance, Sample size)
- Randomization level Medium (Randomization, Variance)
- AA tests Easy (Variance)