Search code examples
rglmresamplingstatistics-bootstrap

Resampling in logistic regression


I have a simple dataset with one Y and 10 predictors (X1-X10) coded either 0,1 or 2 for 100 observations.

 n <- 100
 Y <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
 X1 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.1,0.4,0.5))
 X2 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.5,0.25,0.25))
 X3 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.3,0.4,0.4))
 X4 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.35,0.35,0.3))
 X5 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.1,0.2,0.7))
 X6 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.8,0.1,0.1))
 X7 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.1,0.1,0.8))
 X8 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.35,0.35,0.3))
 X9 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.35,0.35,0.3))
X10 <- c(0,2,2,2,2,2,2,2,0,2,0,2,2,0,0,0,0,0,2,0,0,2,2,0,0,2,2,2,0,2,0,2,0,2,1,2,1,1,1,1,1,1,1,1,1,1,1,0,1,2,2,2,2,2,2,2,2,2,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,0,0,0,0)

datasim <- data.frame(Y,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10)

I am trying to do bootstrap resampling as follows which works in producing 100 different set of samples for one variable.

 B <- 100
 n <- length(datasim$X1)
 boot.samples <- matrix(sample(datasim$X1, size=B*n, replace=TRUE),B,n)

Now, I am trying to incorporate a function to calculate deviance difference using GLM. My desire is to produce dDeviance for each of the bootstrap samples (100 values). I tried the following function, but it only gives me 100 similar values of dDeviance.

 xfunction <- function(x){
 glmfit <- glm(Y~X1, family="binomial", data=datasim)
 dDeviance <- glmfit$null.deviance-glmfit$deviance
 return(dDeviance)
 }

 boot.statistics <- apply(boot.samples,1,xfunction)

Solution

  • The argument for xfunction when used in apply like this is a row from the matrix. In your original code that row wasn't being used and you were running the function for the same data each time. One approach to this kind of problem would be to change the data argument in glm to be your new data each time as has been suggested (glmfit <- glm(Y~X1, family="binomial", data=x)), but this assumes that x will be a dataframe with columns named Y and X1, whereas what you actually have as x is a vector of values for X1. The easiest solution is to change X1 in each fit.

    xfunction <- function(x){
      glmfit <- glm(Y~x, family="binomial")
      dDeviance <- glmfit$null.deviance-glmfit$deviance
      return(dDeviance)
    }
    
    boot.statistics <- apply(boot.samples,1,xfunction)