I have the following dataframe:
df.head()
id case volfluid map rr o2 fluid
1044 36 3 3.0 1.0 3.0 2.0 0.0
1045 37 3 2.0 3.0 1.0 2.0 1.0
1046 38 3 3.0 2.0 2.0 1.0 0.0
1047 36 4 2.0 3.0 1.0 3.0 1.0
1048 37 4 1.0 1.0 3.0 3.0 1.0
.
.
.
I want to run a logistic regression model clustered on id
and with robust standard errors. Here is what I have for the equation
smf.logit('''fluid ~ C(volfluid) + C(map, Treatment(reference = 3.0)) +
C(o2, Treatment(reference = 3.0)) + C(rr) +
C(case, Treatment(reference = 4))''',
data = df).fit(cov_type='cluster', cov_kwds={'groups': df['id']})
I'm not sure if this accomplishes both the clustering, and the robust std. errors. I understand that setting cov_type = 'hc0'
provides robust std. errors, but if I do that can I still cluster on id
? And do I need to do that, or are clustered standard errors inherently robust?
Thank you!
Cluster robust standard errors are also heteroscedasticity robust (HC). A HC cov_types do not take any correlation into account.
Related aside: Using GEE with independence correlation has the same underlying model as Logit but has the option of bias-reduced cluster robust standard errors (similar so CR3, the HC3 analogue for cluster correlations)