I know that we can use a list to indicate the order:
tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0], labels=[0,1]).ravel()
but the meaning of the elements of the matrix depends on two assumptions:
0
or 1
is assumed to be the POSITIVE (or NEGATIVE) class.
and none of them are directly mentioned in the docstring.This question has been already asked here, but I think here I am asking about the root of the confusion and not the confusion in its general term. The issue is not how to interpret the confusion-matrix, but how to set a specific class as positive or negative.
Short answer
In binary classification, when using the argument labels
,
confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0], labels=[0,1]).ravel()
the class labels, 0
, and 1
, are considered as Negative
and Positive
, respectively. This is due to the order implied by the list, and not the alpha-numerical order.
Verification: Consider an imbalance class labels like this: (using imbalance class to make the distinction easier)
>>> y_true = [0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0]
>>> y_pred = [0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0]
>>> table = confusion_matrix(y_true, y_pred, labeels=[0,1]).reval()
this would give you a confusion table as follows:
>>> table
array([12, 1, 2, 1])
which corresponds to:
Actual
| 1 | 0 |
___________________
pred 1 | TP=1 | FP=1 |
0 | FN=2 | TN=12|
where FN=2
means that there were 2 cases where the model predicted the sample to be negative (i.e., 0
) but the actual label was positive (i.e., 1
), hence False Negative equals 2.
Similarly for TN=12
, in 12 cases the model correctly predicted the negative class (0
), hence True Negative equals 12.
This way everything adds up assuming that sklearn
considers the first label (in labels=[0,1]
as the negative class. Therefore, here, 0
, the first label, represents the negative class.