How does pROC handle multi-level factor labels?

I am calculating the AUC of a model in R. The model has been trained to predict a two-level factor (good/bad). It has been applied to data that have a three-level outcome (good/bad/missing). I am fine with the scoring part. I get a probability based on a set of predictors for each observation.

The part I don't understand is what happens when I calculate AUC using the roc(data$label, data$score), because now roc$label has 3 levels (good/bad/missing), but the score was trained on data that had only 2 levels (good/bad). Is the new level ignored? Should I exclude all such observations manually from the data to get an accurate AUC measure?

data <- structure(list(label = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 1L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L), .Label = c("missing", 
"good", "bad"), class = "factor"), score = c(0.151147571051044, 
0.0411329810171418, 0.0688491931089625, 0.0457818202643564, 0.0411038297454905, 
0.0652004019004794, 0.105964115208592, 0.0538514549969684, 0.0415476305130247, 
0.0533831523731155, 0.0639788335617257, 0.0434341986489527, 0.0520826001358534, 
0.0642210548642832, 0.0536219837901353, 0.0415821872079014, 0.0416555537422, 
0.0491937562992912, 0.0469082976746886, 0.0538194884632293)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))

roc(data$label, data$score)

Solution

Unless you have a particularly old version of pROC, or you have something silencing diagnostic messages, it should tell you what it is doing:

> roc(data$label, data$score)
Setting levels: control = missing, case = good
Setting direction: controls < cases

As you can see, it uses the "missing" class as the control or negative class.

It carries on showing you what data was used:

[...]
Data: data$score in 3 controls (data$label missing) < 16 cases (data$label good).

Again you can observe that it is using the "missing" label as control.

Finally it gives you a hint at how to solve the problem:

[...]
Warning message:
In roc.default(data$label, data$score) :
  'response' has more than two levels. Consider setting 'levels' explicitly or using 'multiclass.roc' instead

In your case it is easiest to set the levels argument as suggested:

> roc(data$label, data$score, levels=c("good", "bad"))
Setting direction: controls > cases

Call:
roc.default(response = data$label, predictor = data$score, levels = c("good",     "bad"))

Data: data$score in 16 controls (data$label good) > 1 cases (data$label bad).
Area under the curve: 0.8125

Now it correctly uses the good/bad levels as you asked.

One last thing, notice that pROC is still setting the direction automatically:

Setting direction: controls > cases

You should make sure that this matches the direction (whether positive cases are higher or lower than negatives) you obtained on the training data.

train.roc <- roc(train.data$label, train.data$score, levels=c("good", "bad"))
roc(data$label, data$score, levels=c("good", "bad"), direction=train.roc$direction)

Failing to do so you might introduce some bias in your AUCs, and you might think that your predictor performs great when it doesn't.

In general, you want to set the levels and direction arguments explicitly whenever possible. if the direction gets somehow reversed between training and testing.