Search code examples
rcluster-analysisreplicate

ClusterBootstrap::clusbootglm() taking long time to run


I am using the clusbootglm() function from the ClusterBootstrap package. It is taking an unusually long amount of time to run this. The dataframe only contains 900 rows and 4 columns.

clusfunc <- function(df1) {
  mod1 <- clusbootglm(y ~ treat + u, data = 
df1, clusterid = group, family = gaussian, B = 900)
  coef(mod1)[[2]]
}

betasclustered <- replicate(1000, clusfunc(df1))

Here is the documentation for this function.

Running one iteration of the function takes about a second. However, running 1000 is taking way longer than 1000 seconds. Do you have any advice? Should I write a different function myself instead of using the clusbootglm() function?


Solution

  • Rather than using the clusbootglm(), I can use the following function. I have tested it, and this only takes a few seconds to iterate 1000 times. It is still unclear to me why clusbootglm() took so long to run (over 45 minutes), but this is a good alternative.

    getclusteredsamplecoef <- function(df1) {
      sample <- df1 %>% 
      group_by(group) %>% 
      nest(df1 = -group) %>%  
      ungroup() %>% 
      sample_n(180, replace = TRUE) %>% 
      unnest(df1)
      model <- lm(y ~ treat + u, sample)
      return(model$coefficients[[2]])
    }
    

    It is worth noting that this is not exactly the same as clusbootglm() because I run a linear model instead of a generalized linear model. This could be changed by using glm() or lm_robust() in place of lm() in the function.

    Setting n=180 yields 180 groups. In my sample, there are 5 individuals within each group, so this yields 900 observations. If you wanted to get a certain number of observations, take this number and divide it by the number within each group, and use the result as the input to sample_n().