We are in Beta and we are offering 50% off! Use code BETATESTER at checkout.
You can access a significantly larger sample of the platform's content for free by logging in with your Gmail account. Sign in now to explore.

Multiclass evaluation metrics

Machine Learning Medium Seen in real interview

How can we evaluate multiclass classification problems?

There are typically two approaches to solve multi-class problems:

  • One vs. all (OVA): choose the more confident (predicted class is the one with the higher probability)
  • One vs. one (OVO): predicted class is defined through majority voting

We can evaluate such classifiers with:

  • Confusion matrix (more visual, harder to quantify)
  • Averaged (see below) Precision, recall, F1, AUC (easier to quantify)
  • Cohen Kappa score
  • Matthew’s correlation coefficient
  • Log loss

In addition, depending on the type of problem, we can use accuracy at top n, or other ranking metrics.

Averaging techniques when we do OVA (or OVO):

  • macro: arithmetic mean of all metrics across classes. (In OVO: average of all possible pairwise combinations of classes)

  • weighted: accounts for class imbalance and estimates the weighted average (fewer instances less impact on the averaged score)

  • micro: this is the same as accuracy. Micro-averaging is found by dividing the sum of the diagonal cells of the matrix by the sum of all the cells — i.e., accuracy (rarely used).


Trees and Naive Bayes naturally extend to multiclass problems. The same applies through softmax for logit and neural nets.


Topics

Multiclass, Diagnostics
Similar questions

Provide feedback