I am using the clusbootglm() function from the ClusterBootstrap package. It is taking an unusually long amount of time to run this. The dataframe only contains 900 rows and 4 columns.
clusfunc <- function(df1) {
mod1 <- clusbootglm(y ~ treat + u, data =
df1, clusterid = group, family = gaussian, B = 900)
coef(mod1)[[2]]
}
betasclustered <- replicate(1000, clusfunc(df1))
Here is the documentation for this function.
Running one iteration of the function takes about a second. However, running 1000 is taking way longer than 1000 seconds. Do you have any advice? Should I write a different function myself instead of using the clusbootglm() function?
Rather than using the clusbootglm(), I can use the following function. I have tested it, and this only takes a few seconds to iterate 1000 times. It is still unclear to me why clusbootglm() took so long to run (over 45 minutes), but this is a good alternative.
getclusteredsamplecoef <- function(df1) {
sample <- df1 %>%
group_by(group) %>%
nest(df1 = -group) %>%
ungroup() %>%
sample_n(180, replace = TRUE) %>%
unnest(df1)
model <- lm(y ~ treat + u, sample)
return(model$coefficients[[2]])
}
It is worth noting that this is not exactly the same as clusbootglm() because I run a linear model instead of a generalized linear model. This could be changed by using glm() or lm_robust() in place of lm() in the function.
Setting n=180 yields 180 groups. In my sample, there are 5 individuals within each group, so this yields 900 observations. If you wanted to get a certain number of observations, take this number and divide it by the number within each group, and use the result as the input to sample_n().