Search code examples
pythonlinear-regressionstatsmodels

Polynomial Regression Using statsmodels.formula.api


Please forgive my ignorance. All I'm trying to do is add a squared term to my regression without going through the trouble of defining a new column in my dataframe. I'm using statsmodels.formula.api (as stats) because the format is similar to R, which I am more familiar with.

hours_model = stats.ols(formula='act_hours ~ h_hours + C(month) + trend', data = df).fit()

The above works as expected.

hours_model = stats.ols(formula='act_hours ~ h_hours + h_hours**2 + C(month) + trend', data = df).fit()

This omits h_hours**2 and returns the same output as the line above.

I've also tried: h_hours^2, math.pow(h_hours,2), and poly(h_hours,2) All throw errors.

Any help would be appreciated.


Solution

  • You can try using I() like in R:

    import statsmodels.formula.api as smf
    
    np.random.seed(0)
    
    df = pd.DataFrame({'act_hours':np.random.uniform(1,4,100),'h_hours':np.random.uniform(1,4,100),
                      'month':np.random.randint(0,3,100),'trend':np.random.uniform(0,2,100)})
    
    model = 'act_hours ~ h_hours + I(h_hours**2)'
    hours_model = smf.ols(formula = model, data = df)
    
    hours_model.exog[:5,]
    
    array([[ 1.        ,  3.03344961,  9.20181654],
           [ 1.        ,  1.81002392,  3.27618659],
           [ 1.        ,  3.20558207, 10.27575638],
           [ 1.        ,  3.88656564, 15.10539244],
           [ 1.        ,  1.74625943,  3.049422  ]])