Search code examples
pythonclassificationprecisionconfusion-matrixprecision-recall

Which metric I should use for unbalanced binary classification model?


I performed a SVM on unbalanced dataset, splitting train/test in 70/30. The number of instances in training set are 1163993 of class 1 and 234190 instances of class 0. For the test set i have 498699 instances of class 1 and 100189 instances of class 0. The confusion matrix for SVM is displayed as follow:

What metric should I use to evaluate the model? It could be a solution to use the f-avg, calculating precision, recall and f-1 score for each class in this way:

enter image description here

and then calculate the f-avg by making the arithmetic mean of the two f-1 scores calculated for each class, as reported in the last row of the previous table?


Solution

  • I assume you are looking for a probability evaluation metric here. In my opinion, using AUC-ROC (it is calculated using precision and recall) would be the right approach here, in fact for most binary classification problems.

    Also refer this article.