machine-learning scikit-learn performance-testing roc precision-recall

Good ROC curve but poor precision-recall curve

I have some machine learning results that I don't quite understand. I am using python sciki-learn, with 2+ million data of about 14 features. The classification of 'ab' looks pretty bad on the precision-recall curve, but the ROC for Ab looks just as good as most other groups' classification. What can explain that?

Solution

Class imbalance.

Unlike the ROC curve, PR curves are very sensitive to imbalance. If you optimize your classifier for good AUC on an unbalanced data you are likely to obtain poor precision-recall results.

Random forest is worse than linear regression. Is it normal and what is the reason?
Detectron2 - Extract region features at a threshold for object detection
Detectron2 Checkpoint not found
How to process requests from multiiple users using ML model and FastAPI?
Alternative to device_map = "auto" in Huggingface Pretrained
np.where: "ValueError: operands could not be broadcast together with shapes (38658637,) (9456,)"
How to compute number of weights of CNN?
How to find the connected instances from a minimum spanning trees model in R
Can a neural network be trained while it changes in size?
Keras-rl2 error Compability with Tensorflow
Separate a ingredients/feature into separate columns that is marked with "0" or "1"
How to conditionally assign values to tensor [masking for loss function]?
Uniformity of color and texture in image
What is the role of "Flatten" in Keras?
ClassifierChain with Random Forest: Why is np.nan not supported even though Base Estimator handles it?
Machine learning not predicting correct results
Calculate the Cumulative Distribution Function (CDF) in Python
Am I implementing my perceptron with backpropagation correctly?
Issue setting up SciKeras model
Custom model aggregator TensorFlow Federated
Should the data in batch be balanced?
Multi Step Prediction Neural Networks
Train and test splits by unique dates, not observations
Human segmentation fails with Pytorch, not with Tensorflow Keras
Keras multioutput custom loss with intermediate layers output
Isolation Forest Sklearn for 1D array or list and how to tune hyper parameters
Query padding mask and key padding mask in Transformer encoder
Masking and computing loss for a padded batch sent through an RNN with a linear output layer in pytorch
Why does nn.Linear(in_features, out_features) use a weight matrix of shape (out_features, in_features) in PyTorch?
Facial Expression Recognition Data Preparation for CNN