You can access a significantly larger sample of the platform's content for free by logging in with your Gmail account. Sign in now to explore.

How can we evaluate multiclass classification problems?

There are typically two approaches to solve multi-class problems:

- One vs. all (OVA): choose the more confident (predicted class is the one with the higher probability)
- One vs. one (OVO): predicted class is defined through majority voting

We can evaluate such classifiers with:

- Confusion matrix (more visual, harder to quantify)
- Averaged (see below) Precision, recall, F1, AUC (easier to quantify)
- Cohen Kappa score
- Matthew’s correlation coefficient
- Log loss

In addition, depending on the type of problem, we can use accuracy at top n, or other ranking metrics.

Averaging techniques when we do OVA (or OVO):

**macro**: arithmetic mean of all metrics across classes. (In OVO: average of all possible pairwise combinations of classes)**weighted**: accounts for class imbalance and estimates the weighted average (fewer instances less impact on the averaged score)**micro**: this is the same as accuracy. Micro-averaging is found by dividing the sum of the diagonal cells of the matrix by the sum of all the cells — i.e., accuracy (rarely used).

Trees and Naive Bayes naturally extend to multiclass problems. The same applies through softmax for logit and neural nets.

Multiclass, Diagnostics

- Analyze prediction error Medium (Prediction error, Bias-variance tradeoff, Diagnostics, Learning curves)
- Encoding categorical features Easy (Categorical features, Embeddings, One-hot encoding, Hashing)
- Common causes of data leakage Medium (Data leakage)
- Non-probability sampling Easy (Sampling, Non-probability)
- Cross validation Easy (Cross validation, Offline evaluation)