Search code examples
machine-learningscikit-learnperformance-testingrocprecision-recall

Good ROC curve but poor precision-recall curve


I have some machine learning results that I don't quite understand. I am using python sciki-learn, with 2+ million data of about 14 features. The classification of 'ab' looks pretty bad on the precision-recall curve, but the ROC for Ab looks just as good as most other groups' classification. What can explain that?

enter image description here

enter image description here


Solution

  • Class imbalance.

    Unlike the ROC curve, PR curves are very sensitive to imbalance. If you optimize your classifier for good AUC on an unbalanced data you are likely to obtain poor precision-recall results.