How do I manually calculate scores based on this confusion matrix?
What should be the precision score in this case? tp / (tp + fp) translates to 99% (102 / 103). Right? But the precision score is only 98.36%. If the following scores are correct, why does precision score does not match? (Accuracy score is correct at 94.73% (162/171)
I got this example from :
https://towardsdatascience.com/grid-search-for-model-tuning-3319b259367e
Update:
What should be label order if I want to get the output as shown in this image?
The problem is, that the TP
and FP
in your confusion matrix are swapped.
As described in this example of binary classification, the labels are interpreted as follows:
true negative expected=0, predicted=0
true positive expected=1, predicted=1
false negative expected=1, predicted=0
false positive expected=0, predicted=1
For your example this would be:
## TN TP FN FP
expected = [0]*102 + [1]*60 + [1]*8 + [0]*1
predicted = [0]*102 + [1]*60 + [0]*8 + [1]*1
print ("precision " + '{:.16f}'.format(precision_score(expected, predicted)))
print ("recall " + '{:.16f}'.format(recall_score(expected, predicted)))
print ("accuracy " + '{:.16f}'.format(accuracy_score(expected, predicted)))
precision 0.9836065573770492
recall 0.8823529411764706
accuracy 0.9473684210526315
So the measures are as expected.
The confusion matrix is documented here
By definition a confusion matrix is such that is equal to the number of observations known to be in group but predicted to be in group. Thus in binary classification, the count of true negatives is C 0,0 , false negatives is C 1,0, true positives C 1,1 is and false positives is C 0,1.
This leads to following result:
results = confusion_matrix(expected, predicted)
print('TN ' ,results[0][0])
print('TP ' ,results[1][1])
print('FN ' ,results[1][0])
print('FP ' ,results[0][1])
print(results)
TN 102
TP 60
FN 8
FP 1
[[102 1]
[ 8 60]]
So the measures again are OK, only the position in the confusion matrix is not the usual one with the TP
at top left.
The remedy is as simple as to manually swap the TP
and TN
(results[0][0],results[1][1]) = (results[1][1],results[0][0])
print(results)
[[ 60 1]
[ 8 102]]