I'm following this tutorial https://youtu.be/0HDy6n3UD5M?t=1320 where he says he is calculating the false positives, but gets a numpy array of what I understand to be the 'false negatives' and 'false positives'.
E.g. confusion matrix is:
cm = confusion_matrix(y_train, y_pred, labels =[1,0])
[array([[250, 83],
[ 76, 311]])]
and he outputs the false positives as
FP = cm.sum(axis = 0) - np.diag(cm)
array([76, 83])
Shouldn't false positives just be 83? I read in another article that he might be calculating potential false positives but what does that mean? This seems to be a sum of FP and FN.
Rest of the code is:
FN = cm.sum(axis = 1) - np.diag(cm)
TP = np.diag(cm)
TN = cm.sum() - (FP + FN + TP)
TPR = TP / (TP + FN)
It looks like that's trying to compute metrics in a class-dependent way.
Normally we think of "false positives" as a single number corresponding to an entry in the confusion matrix:
from sklearn.metrics import confusion_matrix
y_true = [0, 0, 0, 1, 1, 1]
y_pred = [0, 0, 1, 0, 0, 1]
tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
print(f"Number of false positives: {fp}")
# Number of false positives: 1
But we can also frame the false positives in a class-dependent way. We can compute a confusion matrix for each class, giving a (C, 2, 2)
matrix where C
is the number of classes:
mcm = multilabel_confusion_matrix(y_true, y_pred)
# [[[1 2]
# [1 2]]
#
# [[2 1]
# [2 1]]]
Meaning we have a vector of true positives and a vector of false positives corresponding to each class:
tps = mcm[:, 1, 1]
# [2 1]
fps = mcm[:, 0, 1]
# [2 1]
Allowing us to compute metrics like "precision for each class":
print(f"Class-dependent precision: {tps / (tps + fps)}")
# Class-dependent precision: [0.5 0.5]
This is also how you arrive at the numbers in classification_report(y_true, y_pred)
:
precision recall f1-score support
0 0.50 0.67 0.57 3
1 0.50 0.33 0.40 3
accuracy 0.50 6
macro avg 0.50 0.50 0.49 6
weighted avg 0.50 0.50 0.49 6