Search code examples
rperformancelogistic-regressionroc

ROCR does not plot standard errors


I am trying to plot a ROC curve with standard deviation using the the ROCR package.

I am using the quality.csv file for a reproducible example to be found here -- https://courses.edx.org/courses/course-v1:MITx+15.071x_3+1T2016/courseware/5893e4c5afb74898b8e7d9773e918208/030bf0a7275744f4a3f6f74b95169c04/

My code is the following:

data <- fread("quality.csv")
glimpse(data)
set.seed(88)
split <- sample.split(data$PoorCare, SplitRatio = 0.75)
data_train <- data[split, ]
data_test <- data[!split, ]

#--------------------------------------------------------------------------
# FITTING A MODEL
#--------------------------------------------------------------------------
model <- glm(PoorCare ~ OfficeVisits + Narcotics , data_train, family = "binomial")

#--------------------------------------------------------------------------
# MAKE PREDICTIONS ON THE TEST DATASET
#--------------------------------------------------------------------------
predict_Test <- predict(model, type = "response", newdata = data_test)

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# THE ROCR PACKAGE
#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

###########################################################################
# CREATE A PERFORMANCE OBJECT
###########################################################################
prediction_obj <- prediction(predict_Test, data_test$PoorCare)

#==========================================================================
# CALCULATE AUC
#==========================================================================
auc = as.numeric(performance(prediction_obj , "auc")@y.values)
# 0.7994792

#==========================================================================
# PLOT ROC CURVE WITH ERROR ESTIMATES
#==========================================================================
plot(perf, colorize=T, avg='threshold', spread.estimate='stddev', spread.scale = 2)

What I get is a ROC curve but without the standard errors:

enter image description here

Could you indicate what is wrong with my code and how to correct it?

Your advice will be appreciated.


Solution

  • The standard deviations and the CIs of the ROC curve can be plotted if a number of repeated (cross-validation or bootstrap) predictions has been performed.
    Consider for example 100 repeated splits of data in training and testing sets with glm estimation and prediction:

    library(dplyr)
    library(data.table)
    library(caTools)
    library(ROCR)
    data <- fread("quality.csv")
    glimpse(data)
    
    set.seed(1)
    reps <- 100
    predTests <- vector(mode="list", reps)
    Labels <- vector(mode="list", reps)
    for (k in 1:reps) {
            splitk <- sample.split(data$PoorCare, SplitRatio = 0.75)
            data_traink <- data[splitk, ]
            data_testk <- data[!splitk, ]
            model <- glm(PoorCare ~ OfficeVisits + Narcotics , 
                     data_traink, family = "binomial")
            predTests[[k]] <- predict(model, type = "response", newdata = data_testk)
            Labels[[k]] <-  data_testk$PoorCare
    }
    

    Now calculate prediction and performance objects using the predTests and Labels lists:

    predObjs <- prediction(predTests, Labels)
    Perfs <- performance(predObjs , "tpr", "fpr")
    

    and plot the set of ROC curves with mean values and confidence intervals:

    plot(Perfs, col="grey82", lty=3)
    plot(Perfs, lwd=3, avg="threshold", spread.estimate="stddev", add=TRUE, colorize=TRUE)
    

    enter image description here