Search code examples
rr-caretrocproc-r-package

Error in roc.default Predictor must be numeric or ordered


I am trying to get the ROC curve of the model I have obtained on the test dataset.

Yet a get an error:

Setting levels: control = negative, case = positive
Error in roc.default(testing_data$tested, predict_rf) : 
  Predictor must be numeric or ordered.

I have followed the below answers, yet did not succeed.

SVM in R: "Predictor must be numeric or ordered."

Failure plotting ROC curve using pROC

I have a similar example worked out few months ago on my posting by someone else on this link:

Error in table(data, reference, dnn = dnn, ...) : all arguments must have the same length when run confusionMatrix with caret, in R

However, I take 'stupidWolf' example and post it here for the sake of reproducibility since I had a previous problem with his answer. Yet, end up into another problem when trying to get my ROC curve.

# choose a sample

idx = sample(nrow(iris),100)
data = iris
data$Petal.Length[sample(nrow(data),10)] = NA
data$tested = factor(ifelse(data$Species=="versicolor","positive","negative"))
data = data[,-5]
training_data = data[idx,]
testing_data= data[-idx,]


# train data 
rf <- caret::train(tested ~., data = training_data, 
                              method = "rf",
                              trControl = ctrlInside,
                              metric = "ROC", 
                              na.action = na.exclude)

# test the model on test data

colnames(evalResult.rf)[max.col(evalResult.rf)]
testing_data = testing_data[complete.cases(testing_data),]
evalResult.rf <- predict(rf, testing_data, type = "prob")
predict_rf <- factor(colnames(evalResult.rf)[max.col(evalResult.rf)])
cm_rf_forest <- confusionMatrix(predict_rf, testing_data$tested, "positive")

# get the roc
library(pROC)
rfROCt <- pROC::roc(testing_data$tested, predict_rf)

And get the error :

Setting levels: control = negative, case = positive
Error in roc.default(testing_data$tested, predict_rf) : 
  Predictor must be numeric or ordered.

Solution

  • The second argument should be a probability for the prediction, so if you look at the example:

    evalResult.rf <- predict(rf, testing_data, type = "prob")
    head(evalResult.rf)
    
       negative positive
    2     0.968    0.032
    8     1.000    0.000
    9     0.996    0.004
    13    0.990    0.010
    

    The second column is the probability of the positive class.

    So you use it like this

    pROC::roc(testing_data$tested,evalResult.rf[,2])
    Setting levels: control = negative, case = positive
    Setting direction: controls < cases
    
    Call:
    roc.default(response = testing_data$tested, predictor = evalResult.rf[,     2])
    
    Data: evalResult.rf[, 2] in 24 controls (testing_data$tested negative) < 22 cases (testing_data$tested positive).
    Area under the curve: 0.9924