Search code examples
rlogistic-regressionstatistics-bootstrap

R: compute the bootstrap estimate using rms:bootcov on caret model and glm model


How do I compute the bootstrap estimate for my regression coefficients using the function bootcov from the package rms? I tried the below with sample dataset but got an error:

library(mlbench)
data(PimaIndiansDiabetes)

library(caret)
trControl <- trainControl(method = "repeatedcv",
                          repeats = 3,
                          classProbs = TRUE,
                          number = 10, 
                          savePredictions = TRUE,
                          summaryFunction = twoClassSummary)

caret_model <- train(diabetes~., 
                     data=PimaIndiansDiabetes, 
                     method="glm", 
                     trControl=trControl)

library(rms)
set.seed(1234)
reduced_model_bootcov <- bootcov(caret_model$finalModel, B=100)

The error is:

Error in bootcov(caret_model$finalModel, B = 100) : you did not specify x=TRUE and y=TRUE in the fit

If I use the function glm to build the model, this is what I would do:

model <- glm(diabetes~., 
             data=PimaIndiansDiabetes, 
             family=binomial,
             x=TRUE, y=TRUE)
model_bootcov <- bootcov(model, B=100)

But again, I got a different error:

Error in bootcov(model, B = 100) : fitter not valid


Solution

  • Turns out there is a fitting function called Glm in rms, which is a wrapper around glm, but you can also use it if you are interested in using bootcov. So for bootcov to work:

    library(mlbench)
    library(rms)
    data(PimaIndiansDiabetes)
    
    model <- rms::Glm(diabetes~., 
                 data=PimaIndiansDiabetes, 
                 family=binomial,
                 x=TRUE, y=TRUE)
    model_bootcov <- bootcov(model, B=1000)
    

    To use boot:

    library(boot)
    glm.fun <- function(dat, inds){
      fit <- glm(diabetes~.,family=binomial,data=dat[inds,])
          coef(fit)
         }
    model_boot <- boot(PimaIndiansDiabetes, glm.fun, R = 1000)
    

    We can compare how the two different models bootstrap, of course the seeds are different and most likely you need to set the similar seeds first:

    library(tidyr)
    library(dplyr)
    library(ggplot2)
    
    melt_matrix = function(mat,NAMES,X){
    colnames(mat) = NAMES
    data.frame(mat) %>% 
    tibble::rownames_to_column("B") %>% 
    pivot_longer(-B) %>%
    mutate(type=X)
    }
    
    VAR = names(coef(model))
    
    plotdf = rbind(
    melt_matrix(model_boot$t,VAR,"boot"),
    melt_matrix(model_bootcov$boot.Coef,VAR,"bootcov")
    )
    
    ggplot(plotdf,aes(x=type,y=value))+ geom_violin() + facet_wrap(~name,scale="free_y")
    

    enter image description here