Search code examples
rglmstatistics-bootstrap

First Difference Bootstrap from Negative Binomial


novice here. I am fitting a negative binomial model on count data where Y is the count of events, D is the treatment, and X is a logarithmic offset:

out <- glm.nb(y ~ d + offset(log(x)),data=d1)

I would like to bootstrap the confidence intervals of the first difference between D=1 and D=0. I've gotten this far, but not sure if it is the correct approach:

holder <- matrix(NA,1200,1)
out <- out <- glm.nb(y ~ d + offset(log(x)),data=d1)

for (i in 1:1200){
q <- sample(1:nrow(d1), 1)
d2 <- d1[q,]
d1_1 <- d1_2 <- d2
d1_1$d <- 1
d1_2$d <- 0
d1pred <- predict(out,d1_1,type="response")
d2pred <- predict(out,d1_2,type="response")
holder[i,1] <- (d1pred[1] - d2pred[1])
}
mean(holder)

Is this the correct way to bootstrap the first difference?


Solution

  • Generally, your approach is ok, but you can do it in more R-ish way. Firstly, if you are serious about bootstrapping you can employ boot library and benefit from more compact code, no loops and many other advanced options.

    In your case it can look like:

    ## Data generation
    N <- 100
    set.seed(1)
    d1 <- data.frame(y=rbinom(N, N, 0.5),
                     d=rbinom(N, 1, 0.5),
                     x=rnorm(N, 10, 3))
    ## Model
    out <- glm.nb(y ~ d + offset(log(x)), data=d1)
    
    ## Statistic function (what we are bootstrapping)
    ## Returns difference between D=1 and D=0
    diff <- function(x,i,model){
      v1 <- v2 <- x[i,]
      v1$d <- 1
      v2$d <- 0
      predict(model,v1,type="response") - predict(model,v2,type="response")
    }
    
    ## Bootstrapping itself
    b <- boot(d1, diff, R=5e3, model=out)
    mean(b$t)
    

    Now b$t holds bootstrapped values. See names(b) and/or ?boot for extra information.

    Bootstrapping is time consuming operation, and one of the obvious advantage of boot library is support for parallel operations. It's as easy as:

     b <- boot(d1, diff, R=5e3, model=out, parallel="multicore", ncpus=2)
    

    If you are on Windows use parallel="snow" instead.