Search code examples
rpredictionroc

Calculate the ROC curve from the binary classification output


I must be able to plot the ROC curve on a binary classification problem, but as a predictor a numerical or ordered vector must be inserted and since I have performed the classification my predictor is factor (0,1).

Is there a way to solve this problem?

rfCarseats

Call:
 randomForest(formula = Salesdic ~ ., data = train_Carseats, proximity = TRUE) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 3

        OOB estimate of  error rate: 20%
Confusion matrix:
    0  1 class.error
0 153 17   0.1000000
1  39 71   0.3545455

> prediction_rf_Carseats
  2   3   4   6  10  13  15  19  24  28  32  45  46  52  54  56  60  61  66  67  69  70  73  76  79  81 101 106 111 
  1   1   0   1   0   0   1   1   0   0   0   0   0   0   0   0   0   0   0   1   1   0   0   0   0   1   0   0   1 
116 121 128 130 139 143 149 155 161 162 163 164 167 168 171 172 176 179 186 188 189 190 191 194 195 201 203 204 206 
  0   0   0   0   1   0   0   1   0   0   0   0   0   0   0   1   0   1   0   0   0   1   1   1   1   0   0   0   0 
207 208 211 215 220 221 225 229 232 233 234 236 239 243 249 251 253 257 258 264 267 274 279 283 290 295 297 300 301 
  0   0   0   0   1   1   0   0   1   1   1   0   0   0   0   1   0   0   0   0   1   1   1   1   0   1   1   1   1 
304 306 307 308 311 312 316 318 321 323 326 331 332 336 338 339 340 346 353 356 362 363 369 370 372 374 376 385 388 
  1   0   0   0   0   0   0   0   0   1   1   0   0   0   0   0   1   0   1   1   0   0   1   1   0   0   0   1   0 
392 396 397 399 
  0   1   0   0 
Levels: 0 1

> train_Carseats$Salesdic
  [1] 1 0 1 1 0 1 0 0 0 0 0 1 1 0 0 0 1 1 0 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 0 1 0 0 0 0 1 0 1 0 1 0 1 0
 [57] 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 0 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 0 0 0 0 1 1 1 1 0 1
[113] 0 1 1 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 1 1 1 1
[169] 0 1 0 0 1 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 0 1
[225] 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 1
Levels: 0 1


Solution

  • EDIT (Problem solved): Starting from the fact that I have a randomForest object, from the output I can proceed with the following code:

    library(pROC)
    ROC_Carseats_RF <- roc(train_Carseats$Salesdic, rfCarseats$votes[ , 1],
                    smoothed = TRUE,
                    ci=TRUE, ci.alpha=0.9, stratified=FALSE,
                    plot=TRUE, auc.polygon=TRUE, max.auc.polygon=TRUE, grid=TRUE,
                    print.auc=TRUE, show.thres=TRUE)
    plot.roc(ROC_Carseats_RF, print.auc = TRUE)
    

    It is taken into consideration the response values ​​with the value of the trees in the forest that voted correctly.

    Plot is here