Search code examples
pythonregressionstatsmodels

How to do 2SLS IV regression using statsmodels python?


I'm trying to do 2 stage least squares regression in python using the statsmodels library:

from statsmodels.sandbox.regression.gmm import IV2SLS
                 
resultIV = IV2SLS(dietdummy['Log Income'],
                  dietdummy.drop(['Log Income', 'Diabetes']),
                  dietdummy.drop(['Log Income', 'Reads Nutri')

Reads Nutri is my endogenous variable my instrument is Diabetes and my dependent variable is Log Income.

Did I do this right? It is much different than the way I would do it on Stata.

Also, when I do resultIV.summary(), I get a TypeError (something to do with the F statistic being nonetype). How can I resolve this?


Solution

  • I found this question when I wanted to do an IV2SLS regression myself and had the same problem. So, just for everybody else who landed here.

    The documentation of statsmodels shows how to use this command. Your arguments are endog, exog, and instrumentin that order where exog includes variables which are instrumented and instrument the instruments and other control variables. In that sense, your model is fine.

    The TypeError you found is currently an open bug in versions 0.6.0 and 0.8.1. and will be fixed in 0.9.0 according to the milestone.

    Update (28.06.2018): Version 9.0.0 was released on 15 May and should include a fix for the aforementioned bug.