Search code examples
rrandom-forestr-caretroc

How to compute ROC and AUC under ROC after training using caret in R?


I have used caret package's train function with 10-fold cross validation. I also have got class probabilities for predicted classes by setting classProbs = TRUE in trControl, as follows:

myTrainingControl <- trainControl(method = "cv", 
                              number = 10, 
                              savePredictions = TRUE, 
                              classProbs = TRUE, 
                              verboseIter = TRUE)

randomForestFit = train(x = input[3:154], 
                        y = as.factor(input$Target), 
                        method = "rf", 
                        trControl = myTrainingControl, 
                        preProcess = c("center","scale"), 
                        ntree = 50)

The output predictions I am getting is as follows.

  pred obs    0    1 rowIndex mtry Resample

1    0   1 0.52 0.48       28   12   Fold01
2    0   0 0.58 0.42       43   12   Fold01
3    0   1 0.58 0.42       51   12   Fold01
4    0   0 0.68 0.32       55   12   Fold01
5    0   0 0.62 0.38       59   12   Fold01
6    0   1 0.92 0.08       71   12   Fold01

Now I want to calculate ROC and AUC under ROC using this data. How would I achieve this?


Solution

  • A sample example for AUC:

    rf_output=randomForest(x=predictor_data, y=target, importance = TRUE, ntree = 10001, proximity=TRUE, sampsize=sampsizes)
    
    library(ROCR)
    predictions=as.vector(rf_output$votes[,2])
    pred=prediction(predictions,target)
    
    perf_AUC=performance(pred,"auc") #Calculate the AUC value
    [email protected][[1]]
    
    perf_ROC=performance(pred,"tpr","fpr") #plot the actual ROC curve
    plot(perf_ROC, main="ROC plot")
    text(0.5,0.5,paste("AUC = ",format(AUC, digits=5, scientific=FALSE)))
    

    or using pROC and caret

    library(caret)
    library(pROC)
    data(iris)
    
    
    iris <- iris[iris$Species == "virginica" | iris$Species == "versicolor", ]
    iris$Species <- factor(iris$Species)  # setosa should be removed from factor
    
    
    
    samples <- sample(NROW(iris), NROW(iris) * .5)
    data.train <- iris[samples, ]
    data.test <- iris[-samples, ]
    forest.model <- train(Species ~., data.train)
    
    result.predicted.prob <- predict(forest.model, data.test, type="prob") # Prediction
    
    result.roc <- roc(data.test$Species, result.predicted.prob$versicolor) # Draw ROC curve.
    plot(result.roc, print.thres="best", print.thres.best.method="closest.topleft")
    
    result.coords <- coords(result.roc, "best", best.method="closest.topleft", ret=c("threshold", "accuracy"))
    print(result.coords)#to get threshold and accuracy