I have encountered an issue where the recall score obtained using recall_score(y, y_pred)
does not match the value calculated manually using confusion_matrix
.
Not only that, but recall is exactly the same value as specificity which I've also calculated manually below.
Here is the relevant code I'm using:
recall = recall_score(y, y_pred) # <-- different score
conf_matrix = confusion_matrix(y, y_pred)
tn, fp, fn, tp = conf_matrix.ravel()
manual_recall = tp / (tp + fn) # <-- to this score
specificity = tn / (tn + fp) # <-- and is the same as the score above
Here's an example of a confusion matrix as printed in the terminal where this happens:
[[34 6]
[20 20]]
Sci Kit Recall: 0.85 Manual Recall: 0.5
or
[[29 11]
[ 9 31]]
Sci Kit Recall: 0.725 Manual Recall: 0.775
Problem:
Recall as returned by scikit-learn and manual recall do not produce the same value.
Question:
Why might the recall_score
and manual calculation using confusion_matrix
yield different results for the recall score?
More information...
It is a binary classification problem.
I'm using the default threshold for recall_score
.
I've tried to determine whether the confusion table is accurate (it is).
As mentioned by desertnaut, this is an issue with different labels being considered positive.
recall_score
by default considers 1
as being the positive label.
pos_label : int, float, bool or str, default=1
You can change this:
# Assuming your binary classes are 1 and 2
recall = recall_score(y, y_pred, pos_label=2)
On the other hand, conf_matrix
uses the labels in sorted order by default, therefore the positive label is the highest value.
If
None
is given, those that appear at least once iny_true
ory_pred
are used in sorted order.
You can also change this:
conf_matrix = confusion_matrix(y, y_pred, labels=[1,2])
It is recommended to always set the parameters pos_label
and labels
to control the consistency of the metrics.