Search code examples

Statsmodels Clustered Logit Model With Robust Standard Errors

I have the following dataframe:


        id    case   volfluid   map   rr    o2    fluid
1044    36    3      3.0        1.0   3.0   2.0   0.0
1045    37    3      2.0        3.0   1.0   2.0   1.0
1046    38    3      3.0        2.0   2.0   1.0   0.0
1047    36    4      2.0        3.0   1.0   3.0   1.0
1048    37    4      1.0        1.0   3.0   3.0   1.0

I want to run a logistic regression model clustered on id and with robust standard errors. Here is what I have for the equation

smf.logit('''fluid ~ C(volfluid) + C(map, Treatment(reference = 3.0)) + 
             C(o2, Treatment(reference = 3.0)) + C(rr) + 
             C(case, Treatment(reference = 4))''',
             data = df).fit(cov_type='cluster', cov_kwds={'groups': df['id']})

I'm not sure if this accomplishes both the clustering, and the robust std. errors. I understand that setting cov_type = 'hc0' provides robust std. errors, but if I do that can I still cluster on id? And do I need to do that, or are clustered standard errors inherently robust?

Thank you!


  • Cluster robust standard errors are also heteroscedasticity robust (HC). A HC cov_types do not take any correlation into account.

    Related aside: Using GEE with independence correlation has the same underlying model as Logit but has the option of bias-reduced cluster robust standard errors (similar so CR3, the HC3 analogue for cluster correlations)