Search code examples
pythonnumpysklearn-pandasconfusion-matrix

Calculating Confusion matrices


I am currently calculating multiple confusion matrices and normalizing them.

for i in range(0,215)

 [...]
 matrix_confusion[i] = np.asarray(confusion_matrix(Y_test, Y_pred))
 matrix_confusion[i] = matrix_confusion[i].astype(float) / 
 matrix_confusion[i].sum(axis=1)[:,np.newaxis]

The goal is to calculate the mean out of all confusion matrices which are filled in the loop above. The problem is that a lot of matrices are not filled because I am skipping the iterations when a ValueError is raised. So I have some matrices which are empty (prefilled with zeros).

Now I thought about doing the following:

matrix_confusion = matrix_confusion[matrix_confusion!=0]

But this also kills the 0s out of the normalized calculated confusion matrice. How could I proceed if I just want a confusion matrice which represents the mean of all previously filled 2x2 confusion matrices in the loop and to not concider the prefilled ones?

#prefilling
matrix_confusion = np.zeros((200,2,2))

Thanks for your help!


Solution

  • First find the matrices that are not all zeros:

    valids = np.logical_or.reduce(matrix_confusion != 0, axis=(1, 2))
    

    Then compute the mean:

    matrix_confusion_mean = np.mean(matrix_confusion[valids], axis=0)
    

    You should still be careful that at least some matrix is valid, otherwise you would get a matrix of NaNs. You could do:

    if np.any(valids):
        matrix_confusion_mean = np.mean(matrix_confusion[valids], axis=0)
    else:
        matrix_confusion_mean = np.zeros((2, 2))