roc_auc_score mismatch between y_test and y_score

I'm trying to calculate the following:

auc = roc_auc_score(gt, pr, multi_class="ovr")

where gt is a list sized 3470208 containing values between 0 and 41 (all int) and pr is a list sized 3470208 (same size) of lists that each is sized 42 with probabilities in each location that sum up to 1.

However, I'm getting the following error:

ValueError: Number of classes in y_true not equal to the number of columns in 'y_score'

So I am kind of lost as the number of classes in y_true (gt) is 42 because I have a list of integers from 0 to 41.

and since pr is a list of lists of size 42 then I think it should work.

Help will be appreciated!

Solution

Make sure that all integers between 0 and 41 (inclusive) exist in gt.

A simple example:

import numpy as np
from sklearn.metrics import roc_auc_score

# results in error:
gt1 = np.array([0,1,3])
pr1 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3]]
)
#roc_auc_score(gt1, pr1, multi_class='ovr')


# does not result in error:
gt2 = np.array([0,2,1,3])
pr2 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3],
     [0.3, 0.3, 0.2, 0.2]] 
)
#roc_auc_score(gt2, pr2, multi_class='ovr')

Because integer/label 2 is non-existent in gt1 it throws an error. In other words, the number of classes in gt1 (3) is not equal to the number of columns in pr1 (4).