Search code examples
pythonscikit-learnconfusion-matrixmultilabel-classification

Multi-class multi-label confusion matrix with Sklearn


I am working with a multi-class multi-label output from my classifier. The total number of classes is 14 and instances can have multiple classes associated. For example:

y_true = np.array([[0,0,1], [1,1,0],[0,1,0])
y_pred = np.array([[0,0,1], [1,0,1],[1,0,0])

The way I am making my confusion matrix right now:

matrix = confusion_matrix(y_true.argmax(axis=1), y_pred.argmax(axis=1))
print(matrix)

Which gives an output like:

[[ 79   0   0   0  66   0   0 151   1   8   0   0   0   0]
 [  4   0   0   0  11   0   0  27   0   0   0   0   0   0]
 [ 14   0   0   0  21   0   0  47   0   1   0   0   0   0]
 [  1   0   0   0   4   0   0  25   0   0   0   0   0   0]
 [ 18   0   0   0  50   0   0  63   0   3   0   0   0   0]
 [  4   0   0   0   3   0   0  19   0   0   0   0   0   0]
 [  2   0   0   0   3   0   0  11   0   2   0   0   0   0]
 [ 22   0   0   0  20   0   0 138   1   5   0   0   0   0]
 [ 12   0   0   0   9   0   0  38   0   1   0   0   0   0]
 [ 10   0   0   0   3   0   0  40   0   4   0   0   0   0]
 [  3   0   0   0   3   0   0  14   0   3   0   0   0   0]
 [  0   0   0   0   2   0   0   3   0   0   0   0   0   0]
 [  2   0   0   0  11   0   0  32   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   3   0   0   0   0   0   7]]

Now, I am not sure if the confusion matrix from sklearn is capable of handling multi-label multi-class data. Could someone help me with this?


Solution

  • What you need to do is to generate multiple binary confusion matrices (since essentially what you have are multiple binary labels)

    Something along the lines of:

    import numpy as np
    from sklearn.metrics import confusion_matrix
    
    y_true = np.array([[0,0,1], [1,1,0],[0,1,0]])
    y_pred = np.array([[0,0,1], [1,0,1],[1,0,0]])
    
    labels = ["A", "B", "C"]
    
    conf_mat_dict={}
    
    for label_col in range(len(labels)):
        y_true_label = y_true[:, label_col]
        y_pred_label = y_pred[:, label_col]
        conf_mat_dict[labels[label_col]] = confusion_matrix(y_pred=y_pred_label, y_true=y_true_label)
    
    
    for label, matrix in conf_mat_dict.items():
        print("Confusion matrix for label {}:".format(label))
        print(matrix)