Search code examples
python-3.xconfusion-matrix

How do I interpret this 10*10 confusion matrix?


I have below the confusion matrix with 10 Y categories. How do I calculate the accuracy for the categories A, D, and E, and find TP, TM, FP, FN for each?

    A    B   C   D   E   F   G   H   I   J
   [41,  0,  0,  2,  1,  0,  0,  0,  0,  4],
   [ 1,  0,  0,  0,  4,  0,  0,  0,  0,  2],
   [ 3,  0, 12,  0,  1,  0,  0,  0,  0,  0],
   [ 0,  0,  0, 51, 10,  0,  0,  0,  0,  0],
   [ 1,  0,  0,  3, 78,  0,  0,  0,  0,  5],
   [ 1,  0,  0,  0,  0,  0,  0,  0,  0,  3],
   [ 4,  0,  0,  0,  2,  0,  5,  0,  0,  4],
   [ 0,  0,  1,  1,  3,  0,  0,  2,  0,  1],
   [ 4,  0,  0,  0,  1,  0,  0,  0,  0,  0],
   [10,  0,  0,  5, 15,  0,  0,  0,  0, 24]

Thank you for the help!


Solution

  • Visualise your confusion matrix

    X = [[41, 0, 0, 2, 1, 0, 0, 0, 0, 4],
     [1, 0, 0, 0, 4, 0, 0, 0, 0, 2],
     [3, 0, 12, 0, 1, 0, 0, 0, 0, 0],
     [0, 0, 0, 51, 10, 0, 0, 0, 0, 0],
     [1, 0, 0, 3, 78, 0, 0, 0, 0, 5],
     [1, 0, 0, 0, 0, 0, 0, 0, 0, 3],
     [4, 0, 0, 0, 2, 0, 5, 0, 0, 4],
     [0, 0, 1, 1, 3, 0, 0, 2, 0, 1],
     [4, 0, 0, 0, 1, 0, 0, 0, 0, 0],
     [10, 0, 0, 5, 15, 0, 0, 0, 0, 24]]
    
    cm = pd.DataFrame(X, columns=list("ABCDEFGHIJ"), index=list("ABCDEFGHIJ")) 
    
    print(cm)
    

    Output:

        A  B   C   D   E  F  G  H  I   J
    A  41  0   0   2   1  0  0  0  0   4
    B   1  0   0   0   4  0  0  0  0   2
    C   3  0  12   0   1  0  0  0  0   0
    D   0  0   0  51  10  0  0  0  0   0
    E   1  0   0   3  78  0  0  0  0   5
    F   1  0   0   0   0  0  0  0  0   3
    G   4  0   0   0   2  0  5  0  0   4
    H   0  0   1   1   3  0  0  2  0   1
    I   4  0   0   0   1  0  0  0  0   0
    J  10  0   0   5  15  0  0  0  0  24
    

    Reading a confusion matrix goes as the following: rows are actual labels, columns are predicted labels. A perfect model would have a diagonal confusion matrix, as it would correctly predict all the time! Read more on confusion matrices.

    Here, you can read that your model is sometimes wrong. It predicted A 10 times when the answer was actually J... But it's particularly good for category G: on the five times it was predicted, it was always right!

    Category accuracy

    A category accuracy is obtained when counting how many times you predicted it well, among all the times you predicted it:

    >>> cm["A"]["A"] / cm.sum(axis=0)["A"]                                                                                                               
    0.6307692307692307
    
    >>> cm["D"]["D"] / cm.sum(axis=0)["D"]                                                                                                               
    0.8225806451612904
    
    >>> cm["E"]["E"] / cm.sum(axis=0)["E"]                                                                                                               
    0.6782608695652174
    

    TP, TN, FP, FN for each

    These measures usually make sense in a binary classification setup, yet for a given category, you can imagine being in a one-vs-all (considered category vs all the rest) setup, which looks like binary, hence calculate these measures.

    Taking advantage of this answer, you can get all TP, TN, FP, FN values for each category using the following:

    FP = cm.sum(axis=0) - np.diag(cm)   
    FN = cm.sum(axis=1) - np.diag(cm) 
    TP = pd.Series(np.diag(cm), index=list("ABCDEFGHIJ"))
    TN = np.matrix(cm).sum() - (FP + FN + TP)  
    

    Now, FP for category A is:

    >>> FP["A"]
    24  #  you can verify, it's the sum of all values except diagonal element
    

    Same logic applies for all other measures.