Search code examples
rglmstatistics-bootstrap

clusterSEs: bootstrapped SE on glm with interaction terms error


I am running a logistic regression using glm() and want to calculate standard errors using cluster.bs.glm() from clusterSEs.

The first bit of code throws an error:

mod1 <- glm(lfp ~ age + I(age^2) + genstat + married +
            isced + factor(syear) + 
            I(factor(syear):married), 
            data = subw, 
            family=binomial(link='logit'))

library(clusterSEs)
head(subw)
se <- cluster.bs.glm(mod=mod1, dat=subw, cluster= ~pid ,  boot.reps = 10)

Error in cl(dat, mod, clust)[ind.variables, 2] : subscript out of bounds

When I remove the interaction term there is no problem:

mod1 <- glm(lfp ~ age + I(age^2) + genstat + married +
            isced + factor(syear), 
            data = subw, 
            family=binomial(link='logit'))


se <- cluster.bs.glm(mod=mod1, dat=subw, cluster= ~pid ,  boot.reps = 10)

Is there a programming reason, why this should not work? Since glm reports all coefficients of the interaction term, some are NA, I'd expect the code above to work nevertheless.


Solution

  • It's tough to troubleshoot the example without a reproducible example. However, one potential solution would be to specify the interaction term outside the body of your model as Esarey does in his example on Github.

    your_data <- your_data %>% mutate(your_interaction = var_1 * var_2)
    
    mod1 <- glm(lfp ~ age + I(age^2) + genstat + married +
                isced + factor(syear) + your_interaction, 
                data = subw, 
                family=binomial(link='logit'))