Search code examples
pythonkerasauc

roc_auc_score mismatch between y_test and y_score


I'm trying to calculate the following:

auc = roc_auc_score(gt, pr, multi_class="ovr")

where gt is a list sized 3470208 containing values between 0 and 41 (all int) and pr is a list sized 3470208 (same size) of lists that each is sized 42 with probabilities in each location that sum up to 1.

However, I'm getting the following error:

ValueError: Number of classes in y_true not equal to the number of columns in 'y_score'

So I am kind of lost as the number of classes in y_true (gt) is 42 because I have a list of integers from 0 to 41.

and since pr is a list of lists of size 42 then I think it should work.

Help will be appreciated!


Solution

  • Make sure that all integers between 0 and 41 (inclusive) exist in gt.

    A simple example:

    import numpy as np
    from sklearn.metrics import roc_auc_score
    
    # results in error:
    gt1 = np.array([0,1,3])
    pr1 = np.array(
        [[0.1, 0.7, 0.1, 0.1], 
         [0.3, 0.3, 0.2, 0.2], 
         [0.5, 0.1, 0.1, 0.3]]
    )
    #roc_auc_score(gt1, pr1, multi_class='ovr')
    
    
    # does not result in error:
    gt2 = np.array([0,2,1,3])
    pr2 = np.array(
        [[0.1, 0.7, 0.1, 0.1], 
         [0.3, 0.3, 0.2, 0.2], 
         [0.5, 0.1, 0.1, 0.3],
         [0.3, 0.3, 0.2, 0.2]] 
    )
    #roc_auc_score(gt2, pr2, multi_class='ovr')
    

    Because integer/label 2 is non-existent in gt1 it throws an error. In other words, the number of classes in gt1 (3) is not equal to the number of columns in pr1 (4).