Search code examples
rpredictrocaucproc-r-package

What prediction format should be the input for ROC function


I am trying to calculate the ROC of a target variable that is binary(0,1) versus a decision tree prediction.

When I set the prediction value to be binary, it gives me the following error:

> roc(as.numeric(pred),as.numeric(data$target))

Setting levels: control = 0, case = 1
Setting direction: controls < cases

When I set the prediction value to be a probability, it gives me the following error:

> roc(pred[,2],as.numeric(data$target))

'response' has more than two levels. Consider setting 'levels' 
explicitly or using 'multiclass.roc' insteadSetting levels: 
control = 0.166666666666667, case = 0.232876712328767
Setting direction: controls < cases

So I am confused about what format should I set to the prediction to so that the ROC is calculated correctly? Why is my function showing these errors?


Solution

  • If you look at pROC's roc function documentation, you will see that the formal definition has the following form:

    ## Default S3 method:
    roc(response, predictor, [...]
    

    The prediction is therefore the second argument, not the first as you are using. Therefore your call should look like:

    roc(data$target, pred[,2])
    

    If you forget the order you can always use named argument in order to ignore the order:

    roc(predictor = pred[,2], response = data$target)
    

    Also note it is not necessary and even not recommended to convert the response to a numeric vector, so I removed as.numeric from the calls above.