Search code examples
rloopstidyversebootstrap-modallinear-regression

Ho to run stratified bootstrapped linear regression in R?


Into my model x is categorical variable with 3 categories: 0,1 & 2, where 0 is reference category. However 0 categories are larger than others (1,2), so to avoid biased sample I want to to stratified bootstrapping, but could not find any relevant method for that

df <- data.frame (x  = c(0,0,0,0,0,1,1,2,2),
                  y = c(10,11,10,10,12,17,16,20,19),
                  m = c(6,5,6,7,2,10,14,8,11)
                  )
df$x <- as.factor(df$x)
df$x <- relevel(df$x,ref = "0")


fit <- lm(y ~ x*m, data = df)

summary(fit)

Solution

  • Expanding on Roland's answer in the comments, you can harvest the confidence intervals from bootstrapping using boot.ci:

    library(boot)
    
    b <- boot(df, \(DF, i) coef(lm(y ~ x*m, data = df[i,])), strata = df$x, R = 999)
    
    result <- do.call(rbind, lapply(seq_along(b$t0), function(i) {
      m <- boot.ci(b, type = 'norm', index = i)$normal
      data.frame(estimate = b$t0[i], lower = m[2], upper = m[3])
      }))
    
    result
    #>               estimate      lower       upper
    #> (Intercept) 12.9189189 10.7166127 15.08403731
    #> x1           6.5810811  2.0162637  8.73184665
    #> x2           9.7477477  6.9556841 11.37390826
    #> m           -0.4459459 -0.8010925 -0.07451434
    #> x1:m         0.1959459 -0.1842914  0.55627896
    #> x2:m         0.1126126 -0.2572955  0.48352616
    

    And even plot the results like this:

    ggplot(within(result, var <- rownames(result)), aes(estimate, var)) +
      geom_vline(xintercept = 0, color = 'gray') +
      geom_errorbarh(aes(xmin = lower, xmax = upper), height = 0.1) +
      geom_point(color = 'red') +
      theme_light()
    

    enter image description here