You can access a significantly larger sample of the platform's content for free by logging in with your Gmail account. Sign in now to explore.

What are some reasons that can cause Sample Ratio Mismatch (SRM)? How do you test for SRM?

First let’s recall what is SRM:

Sample ratio mismatch (SRM) occurs when the ratio of observations between control and treatment is not close to the designed (expected) ratio. SRM suggests that something in the experiment is not working as it should (randomization did not work properly), and hence it threatens the experiment’s internal validity (p.45, Kohavi, Tang, and Xu (2020)).

Some of the reasons that SRM can happen include:

Page redirects for treatment (e.g., the treatment is implemented through web page redirects that take significantly longer)

Bad hash randomization (more generally buggy code of randomization)

If the conditions for treatment triggering are influenced by the experiment (more generally, bad trigger conditions can lead to imbalance)

Data pipeline logging, e.g. removing users who are inactivated or deemed bots

Time of day treatment occurs for test and control can bias metric measurement

We can test for SRM through a chi-squared test, where the null is that the SR = 1. Consider for instance that we split 1000 users evenly but the actual groups are 550 - 450. We can estimate the \(\chi^2\) statistic as follows:

\[ \chi^2 = \sum_i \frac{(O_i - N p_i)^2}{Np_i} = \frac{(550-500)^2 + (450-500)^2}{500} = 10 \] The value of the \(\chi^2\) statistic is too large for it to have come from the Null, and hence we can reject the Null.

```
from scipy.stats import chisquare
observed = [550,450]
expected = [500,500]
chi = chisquare(observed, f_exp=expected)
print(f"chi squared statistic: {chi[0]} \np-val: {chi[1]:.3f}")
```

```
## chi squared statistic: 10.0
## p-val: 0.002
```

You can find more info regarding the \(\chi^2\) test here: https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test

Sample Ratio Mismatch

- Normality assumption Medium (Normality)
- Reducing variance in AB testing Medium (Variance)
- Multi-armed and contextual bandits in AB testing Medium (Contextual bandits, Multi-armed bandits)
- Randomization level Medium (Randomization, Variance)
- False discovery control Easy (False discovery rate, Multiple hypotheses testing, Benjamini & Hochberg, Bonferroni)