Search code examples
pythonmachine-learningartificial-intelligencerocauc

Is AUC a better metric than accuracy in case of imbalenced datasets in machine learning,If not which is the best metric?


Is auc better in handling imbalenced data. As in most of the cases if I am dealing with imbalenced data accuracy is not giving correct idea. Even though accuracy is high, model has poor perfomance. If it's not auc which is the best measure to handle imbalenced data.


Solution

  • The great thing about imbalanced classes is not accuracy, because if one class has 1% of examples and the other has 99%, you can classify all examples as zero and still get 99% accuracy.

    Considering the confusion matrix (below), you should also analyze Precision and Recall. These measures give you the total amount of false positives and false negatives.

    Confusion Matrix

    Then you have to define which is your focus. Considering Predictive Maintenance, a false positive is a healthy machine classified as a failure, and a false negative is a machine with failure classified as healthy. You can have 99% accuracy and an excellent AUC and still get 0% precision.

    Precision and Recall

    f1 score