I'm trying to calculate the following:
auc = roc_auc_score(gt, pr, multi_class="ovr")
where gt
is a list sized 3470208 containing values between 0 and 41 (all int) and pr
is a list sized 3470208 (same size) of lists that each is sized 42 with probabilities in each location that sum up to 1.
However, I'm getting the following error:
ValueError: Number of classes in y_true not equal to the number of columns in 'y_score'
So I am kind of lost as the number of classes in y_true (gt)
is 42 because I have a list of integers from 0 to 41.
and since pr
is a list of lists of size 42 then I think it should work.
Help will be appreciated!
Make sure that all integers between 0 and 41 (inclusive) exist in gt.
A simple example:
import numpy as np
from sklearn.metrics import roc_auc_score
# results in error:
gt1 = np.array([0,1,3])
pr1 = np.array(
[[0.1, 0.7, 0.1, 0.1],
[0.3, 0.3, 0.2, 0.2],
[0.5, 0.1, 0.1, 0.3]]
)
#roc_auc_score(gt1, pr1, multi_class='ovr')
# does not result in error:
gt2 = np.array([0,2,1,3])
pr2 = np.array(
[[0.1, 0.7, 0.1, 0.1],
[0.3, 0.3, 0.2, 0.2],
[0.5, 0.1, 0.1, 0.3],
[0.3, 0.3, 0.2, 0.2]]
)
#roc_auc_score(gt2, pr2, multi_class='ovr')
Because integer/label 2 is non-existent in gt1 it throws an error. In other words, the number of classes in gt1 (3) is not equal to the number of columns in pr1 (4).