You can access a significantly larger sample of the platform's content for free by logging in with your Gmail account. Sign in now to explore.

Assume that 1 in every 100 documents is written by AI. We have a predictive model, that has 96% accuracy in predicting a true positive. The model makes a False Positive prediction in 10% of the human-written documents. If a document is labeled as written by AI, what is the likelihood that it is indeed written by AI?

To answer this question, we will apply the Bayes’ rule. From the description, we know that:

\[\begin{align} \Pr(\text{Written by AI}) &= 1/100 \\ \Pr(\text{Predicted Positive | Written by AI}) &= 96/100 \\ \Pr(\text{Predicted Positive | Written by humans}) &= 10/100 \end{align}\]

We are looking for the probability of \(\Pr(\text{Written by AI | Predicted Positive})\). From Bayes’ rule, we know that:

\[ \Pr(\text{Written by AI | Predicted Positive}) = \frac{\Pr(\text{Predicted Positive | Written by AI}) \Pr(\text{Written by AI})}{\Pr(\text{Predicted Positive})} \]

From the above, we only need to estimate the denominator:

\[ \begin{align} \Pr(\text{Predicted Positive} &= \Pr(\text{Predicted Positive | Written by AI}) * \Pr(\text{ Written by AI}) \\ &+ \Pr(\text{Predicted Positive | Written by humans}) * \Pr(\text{ Written by humans}) \\ &= 0.96 * 0.01 + 0.1 * 0.99 = 0.1086 \end{align} \]

Plugging the numbers we get:

\[ \Pr(\text{Written by AI | Predicted Positive}) = \frac{(0.96 * 0.01)}{0.1086} = 0.088 \]

The Bayes’ rule gives counter-intuitive results when there is an imbalance in the input distribution. In this case, the fact that AI prevalence is so low (1%) updates the posterior probability to be really low, despite the high recall (True Positive Rate) of the model (96%).

Bayes rule, Conditional probability

- Monty Hall Medium (Bayes rule, Conditional independence, Prior evidence)
- Unfair coin probability Easy (Bayes rule, Conditional probability)
- Two fair die rolls Easy (Independence, CDF, PMF)
- Largest number rolled Medium (Counting, Permutations, Repetition)
- Sample digits 1-10 Medium (Sample from samples, Uniform)