We are in Beta and we are offering 50% off! Use code BETATESTER at checkout.
You can access a significantly larger sample of the platform's content for free by logging in with your Gmail account. Sign in now to explore.

ROC vs. PR curve

Machine Learning Medium Seen in real interview

Compare ROC with PR curves

We summarize the differences in the following table.

Topic AUC ROC AU PR Curve
Method Captures the tradeoff between true positive rate and false positive rate at different probability thresholds Captures the trade-off between precision and Recall as the probability threshold varries
Random classifier A random classifier will have AUC = 0.5, and a diagnoal \(y=x\) (45 degrees) line. A random classifier will have a horizontal line (depnding on how it is being plotted) at the positive rate ratio and average precision (area under the PR curve) equal to the positive rate ratio (see example below)
Prefect classifier A perfect classifier will have AUC = 1, and a horizontal line at 1. A perfect classifier will have a point at 1,1 (see below)
Skewed Data AUC ROC is more appropriate for relatively balanced datasets as it is overly optimistic in highly imbalanced datasets. This happens becase the false positive rate can be very low even if the classifier has very low precision. PR curves shine in highly imbalanced datasets, and they are more informative in situations where the focus is on correctly identifying the positive cases, while minimizing the number of false positives.

Let’s see some examples.

import pandas as pd
d = pd.DataFrame({'prob' : [0.9,0.8,0.8,0.7,0.7,0.5,0.4,0.3,0.2,0.1], 'label': [1,1,1,1,1,0,0,0,0,0]})
d
##    prob  label
## 0   0.9      1
## 1   0.8      1
## 2   0.8      1
## 3   0.7      1
## 4   0.7      1
## 5   0.5      0
## 6   0.4      0
## 7   0.3      0
## 8   0.2      0
## 9   0.1      0

The above can be thought of as a perfect classifier as it fully separates positive from negative instances at 0.5. The perfect AU PR curve is as follows:

import matplotlib.pyplot as plt
from sklearn.metrics import PrecisionRecallDisplay, roc_curve, RocCurveDisplay, auc
display = PrecisionRecallDisplay.from_predictions(d.label, d.prob, name="Perfect classifier, AUPR")     

Similarly, the perfect AUC ROC curve:

def get_auc_plot(d,t):
    fpr, tpr, thresholds = roc_curve(d.label, d.prob)
    roc_auc = auc(fpr, tpr)
    display = RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=roc_auc,
                                   estimator_name=t)
    display.plot()
get_auc_plot(d,"Perfect classifier, AUC ROC")

Now let’s generate a random classifier (i.e., all instances have the same probability of being positive):


A random classifier is one that predicts the same probability for each instance.


# Random: same output every time
r = d.copy()
#np.random.seed(0)
r['prob'] = [0.2 for _ in range(10)] # np.random.randint(2, size=10)
display = PrecisionRecallDisplay.from_predictions(r.label, r.prob, name="Random classifier, AUPR")    

By default, the above functions adds two points: (0,1) and (0, positive ratio in dataset). If we don’t want the second point, we can draw the AU PR curve as follows:

display.plot(drawstyle="default")
## <sklearn.metrics._plot.precision_recall_curve.PrecisionRecallDisplay object at 0x30f6aedf0>

For the same random classifier, the AUC ROC is:

get_auc_plot(r, "Random classifier, AUC ROC")


To understand how the AUPR is being estimated in sklearn, you can check out the discussion here, and you can also try to call precision, recall, thresholds = precision_recall_curve(r.label, r.prob) which will show you the datapoints that sklearn uses to draw these graphs.


An explanation of why AUC ROC is overly optimistic

Consider a dataset that has 10 positives and 100K negatives. Compare two models:

  • Model A: Predicts 900 positives, 9 of which are TP
  • Model B: Predicts 90 positives, 9 of which are TP

Obviously Model B is better. However:

  • Model A: TPR = 9/10, FPR = 81/100K
  • Model B: TPR= 9/10, FPR = 891/100K

Because the denominator is so big for FPR, these two models will end up having very similar AUC curves (FPR will be close to zero). Simply put, a large change in the number of false positives resulted in a tiny change in the FPR and thereby, ROC is unable to reflect the superior performance of Model B in the context that true negatives are not relevant to the problem.

In contrast, the Precision-Recall (PR) curve is specifically tailored for the detection of rare events and is the metric that should be used when the positive class is of more interest than the negative one. Because precision and recall don’t consider true negatives, the PR curve is not affected by the data imbalance. Back to the example above:

  • Model A: recall = TPR = 0.9 and precision = 9/900 = 0.01
  • Model B: recall = TPR = 0.9 and precision = 9/90 = 0.1

Clearly, PR analysis is more informative compared to the ROC analysis above.


Note that while the random baseline is fixed at 0.5 with ROC, the random baseline of the PR curve is determined by positive class prevalence, i.e. P / (P + N). Since the random baseline of the PR curve shifts based on the prevalence rate of the positive class, it is crucial to compare the AUPRC to its corresponding baseline, rather than looking at its absolute value.



Topics

AUC, ROC, AUPR, Precision, Recall, Evaluation metrics
Similar questions

Provide feedback