Search code examples
pythonmatplotlibscikit-learnrocprecision-recall

Is it possible that Precision-Recall curve or a ROC curve is a horizontal line?


I am working on a binary classification task on imbalanced data.

Since the accuracy is not so meaningful in this case. I use Scikit-Learn to compute the Precision-Recall curve and ROC curve in order to evaluate the model performance.

But I found both of the curves would be a horizontal line when I use Random Forest with a lot of estimators, it also happens when I use a SGD classifier to fit it.

The ROC chart is as following:

enter image description here

And the Precision-Recall chart:

enter image description here

Since Random Forest behaves randomly, I don't get a horizontal line in every run, sometimes I also get a regular ROC and PR curve. But the horizontal line is much more common.

Is this normal? Or I made some mistakes in my code?

Here is the snippet of my code:

classifier.fit(X_train, Y_train)
try:
    scores = classifier.decision_function(X_test)
except:
    scores = classifier.predict_proba(X_test)[:,1]

precision, recall, _ = precision_recall_curve(Y_test, scores, pos_label=1)
average_precision = average_precision_score(Y_test, scores)

plt.plot(recall, precision, label='area = %0.2f' % average_precision, color="green")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision Recall Curve')
plt.legend(loc="lower right")
plt.show()

Solution

  • Yes, you can. If you perfectly separate the data into two piles, then you go vertically from zero to 1 true-positive-rate without any false positives (the vertical line) as your threshold passes over your pile of true positives, then from 0 to 1 false-positive-rate as your threshold passes over your pile of true negatives.

    If you can get the same ROC curve from a test set, you are golden. If you can get the same ROC curve evaluated on 5 different k-fold cross validation test sets, you are platinum.