machine-learning statistics classification precision imbalanced-data

F1 - score with imbalanced data

I am working on a binary classification task. My evaluation data is imbalanced and consists of appr. 20% from class1 and 80% from class2. Even I have good classification accuracy on each class type, as 0.602 on class1, 0.792 on class2 if I calculate f1 score over class1, I get 0.46 since the false-positive count is large. If I calculate it over class2, I get f1-score as 0.84.

My question is that, what is the best practice to evaluate classification task on imbalanced data? Can I get an average of these f1-scores or should I choose one of them? What is the best evaluation metric for the evaluation of classification tasks on imbalanced data?

Btw, these are my TP, TN, FN, FP counts;

TP: 115

TN: 716

FN: 76

FP: 188

Solution

I am not sure if that is what you are looking for, but since the data from which you want to get a performance metric from is imbalanced, you could try to apply weighted measurements, such as a weighted f1-score. From scikit-learn the f1-score features a 'weighted' option, which considers the number of instances per label. This way you can get an averaged F1-score.

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html

I hope that helps!