Search code examples
rglmstatistics-bootstrap

Correcting (or bootstrapping) the standard errors for a two stage glm (subscript out of bounds)


Cross posted on CrossValidated.

I am trying to bootstrap my results for a variation of the 2SLS approach (2SRI), based on this link. For some reason, the bootstrap does not produce any results.

library(sure) # for residual function and sample data sets
library(MASS) # for polr function
rm(df1)
df1 <- df1
df1$x2 <- df2$y
df1$x3 <- df2$x
df1$y <- df3$x/10
fit.polr <- polr(x2 ~ x + x3, data = df1, method = "probit")

# BOOTSTRAPPING
library(boot)
glm_2siv_coef <- function(data,indices){
  d <- data[indices,]
  stage_1 <- polr(x2 ~ x + x3, Hess=TRUE, data=d)
  stage_2 <- glm(y ~ x + x2 + x3 + resids(stage_1),
  family = "quasibinomial", data=d)
  return(summary(stage_2)$estimate["x2",1])
}

boot.results <- boot(data=df1,statistic=glm_2siv_coef,R=1)
boot.results

Produces: Error in boot.out$t[, index] : subscript out of bounds

When I look at boot.results, the t0 and t values are empty, and they are not supposed to be.

Could anyone help me figure out why?


Solution

  • Your variable x2 is an ordinal variable, so when you call it in the glm, you will have n-1 coefficients, because the default contrast for ordinal dependent variable is contrast.poly, and it transform it into its quadratic terms (see more here):

    glm(y ~ x + x2 + x3 ,data=df1)
    
    Call:  glm(formula = y ~ x + x2 + x3, data = df1)
    
    Coefficients:
    (Intercept)            x         x2.L         x2.Q         x2.C         x2^4  
     -5.561e-16    1.000e-01    1.156e-17   -1.352e-18   -2.349e-17   -2.255e-17  
             x3  
     -5.294e-18 
    

    The code you got from the link is a summary on another R object, but for glm, if you only need the coefficients, there's no harm in returning all the coefficients:

    glm_2siv_coef <- function(data,indices){
      d <- data[indices,]
      stage_1 <- polr(x2 ~ x + x3, Hess=TRUE, data=d)
      stage_2 <- glm(y ~ x + x2 + x3 + resids(stage_1),
                     family = "quasibinomial", data=d)
      coefficients(stage_2)
    }
    boot.results <- boot(data=df1, statistic=glm_2siv_coef, R=10)
    

    You can look the the bootstrap results like this:

    mat = boot.results$t
    colnames(mat) = names(glm_2siv_coef(df1,1:nrow(df1)))
    
    head(mat)
         (Intercept)         x         x2.L        x2.Q         x2.C          x2^4
    [1,]   -2.298085 0.4555501 -0.019243084 -0.01222671  0.003672487  0.0009351758
    [2,]   -2.287667 0.4527313  0.024065483 -0.01124712  0.005012817  0.0028092668
    [3,]   -2.297377 0.4543327  0.042194399 -0.02041060  0.007764768 -0.0025922548
    [4,]   -2.301298 0.4565478  0.009459445 -0.01958682  0.000313143 -0.0058883027
    [5,]   -2.296350 0.4544981 -0.010628593 -0.01141557 -0.005545030 -0.0030184318
    [6,]   -2.297909 0.4537336  0.004486517 -0.01576363  0.016248869 -0.0005906523
                   x3 resids(stage_1)
    [1,] 9.114145e-04    0.0015985900
    [2,] 1.187623e-03   -0.0020883363
    [3,] 1.442589e-03   -0.0027610646
    [4,] 9.258333e-05   -0.0021435513
    [5,] 1.963208e-03    0.0005474028
    [6,] 2.218785e-03    0.0014221113
    

    The x2 related ones will be:

    mat[,2:5]