Search code examples
machine-learningscikit-learnprecision-recall

Why do we use probability to calculate precision recall curve instead of actual class?


If I'm not wrong, We calculate precision and recall values for classifiers by final label predicted. However, theprecision_recall_curve in sklearn uses decision_function instead of final class labels. Does it have any special impact on the final values? Does the extent of confidence impact the curve in any way?


Solution

  • The precision-recall curve is defined by varying the decision threshold. For each threshold, you get a different hard classifier whose precision and recall you can compute, and so you get a point on the curve.

    The precision_recall_curve computes a precision-recall curve from the ground truth label and a score given by the classifier by varying a decision threshold.

    Precision, recall and F-measures | Scikit-learn

    If you pass y_pred as the class predictions, then the precision recall curve becomes degenerate, having only have three points: (0,1), (1,0), and the point corresponding to your (hard) classifier's precision and recall.