Search code examples
rstatisticsregressionsurvey

Properly accounting for cluster effects in survey-weighted logistic regression?


I have a dataset with person-specific survey weights which I am using to predict the probability of cigarette smoking. The survey works in such a way that all participants of randomly sampled households are interviewed. How do i account for the fact that i have multiple people from each household?

Generalized Estimation Equation (geeglm) does not work with survey weights. What about cluster robust standard errors? Svyglm does not fit by maximum likelihood though.

How it currently looks like:

dsgn <- svydesign(ids = ~personID, weights = ~weight ,strata = householdID ,data = data )

model1 <- svyglm(design = dsgn ,formula = smoking ~ education + jobPrestige ,family = "binomial")

Thanks a lot!


Solution

  • svyglm already does what you are asking for, but your svydesign call is wrong. Based on your description, it should be

    dsgn <- svydesign(ids = ~householdID, weights = ~weight ,data = data )
    
    

    [You might also have strata that you haven't told us about, but the householdID isn't a sampling stratum, because one of the defining features of strata is that you sample from every population stratum.]