I'm trying to do 2 stage least squares regression in python using the statsmodels
library:
from statsmodels.sandbox.regression.gmm import IV2SLS
resultIV = IV2SLS(dietdummy['Log Income'],
dietdummy.drop(['Log Income', 'Diabetes']),
dietdummy.drop(['Log Income', 'Reads Nutri')
Reads Nutri
is my endogenous variable my instrument is Diabetes
and my dependent variable is Log Income
.
Did I do this right? It is much different than the way I would do it on Stata.
Also, when I do resultIV.summary()
, I get a TypeError
(something to do with the F statistic being nonetype). How can I resolve this?
I found this question when I wanted to do an IV2SLS regression myself and had the same problem. So, just for everybody else who landed here.
The documentation of statsmodels shows how to use this command. Your arguments are endog
, exog
, and instrument
in that order where exog
includes variables which are instrumented and instrument
the instruments and other control variables. In that sense, your model is fine.
The TypeError
you found is currently an open bug in versions 0.6.0 and 0.8.1. and will be fixed in 0.9.0 according to the milestone.
Update (28.06.2018): Version 9.0.0 was released on 15 May and should include a fix for the aforementioned bug.