Search code examples
pythonscikit-learnconfusion-matrix

precision score does not match with metrics formula


How do I manually calculate scores based on this confusion matrix?

enter image description here

What should be the precision score in this case? tp / (tp + fp) translates to 99% (102 / 103). Right? But the precision score is only 98.36%. If the following scores are correct, why does precision score does not match? (Accuracy score is correct at 94.73% (162/171)

enter image description here

I got this example from :

https://towardsdatascience.com/grid-search-for-model-tuning-3319b259367e


Update:

What should be label order if I want to get the output as shown in this image?

enter image description here


Solution

  • The problem is, that the TPand FP in your confusion matrix are swapped.

    As described in this example of binary classification, the labels are interpreted as follows:

    true negative expected=0, predicted=0

    true positive expected=1, predicted=1

    false negative expected=1, predicted=0

    false positive expected=0, predicted=1

    For your example this would be:

    ##              TN       TP       FN       FP
    expected =  [0]*102 + [1]*60 + [1]*8 + [0]*1
    predicted = [0]*102 + [1]*60 + [0]*8 + [1]*1 
    
    print ("precision " + '{:.16f}'.format(precision_score(expected, predicted)))
    print ("recall    " + '{:.16f}'.format(recall_score(expected, predicted)))
    print ("accuracy  " + '{:.16f}'.format(accuracy_score(expected, predicted)))
    
    precision 0.9836065573770492
    recall    0.8823529411764706
    accuracy  0.9473684210526315
    

    So the measures are as expected.

    The confusion matrix is documented here

    By definition a confusion matrix is such that is equal to the number of observations known to be in group but predicted to be in group. Thus in binary classification, the count of true negatives is C 0,0 , false negatives is C 1,0, true positives C 1,1 is and false positives is C 0,1.

    This leads to following result:

    results = confusion_matrix(expected, predicted)
    print('TN ' ,results[0][0])
    print('TP ' ,results[1][1])
    print('FN ' ,results[1][0])
    print('FP ' ,results[0][1])
    print(results)
    
    
    TN  102
    TP  60
    FN  8
    FP  1
    [[102   1]
     [  8  60]]
    

    So the measures again are OK, only the position in the confusion matrix is not the usual one with the TP at top left.

    The remedy is as simple as to manually swap the TP and TN

    (results[0][0],results[1][1]) = (results[1][1],results[0][0])
    print(results)
    
    [[ 60   1]
     [  8 102]]