Search code examples
pythonregressionstatsmodels

How to include lagged variables in statsmodel ols regression


Is there a way to specify lagged independent variable in statsmodel ols regression? Here's a sample dataframe and ols model specification below. I'd like to include a lagged variable in model.

df = pd.DataFrame({
                   "y": [2,3,7,8,1],
                   "x": [8,6,2,1,9],
                   "v": [4,3,1,3,8]
                 })

Current model:

model = sm.ols(formula = 'y ~ x + v', data=df).fit()

Desired model:

model_lag = sm.ols(formula = 'y ~ (x-1) + v', data=df).fit()

 

Solution

  • I don't think you can call it on the fly in the formula. Maybe using the shift method? Do clarify if this is not what you need

    import statsmodels.api as sm
    df['xlag'] = df['x'].shift()
    df
    
       y  x  v  xlag
    0  2  8  4   NaN
    1  3  6  3   8.0
    2  7  2  1   6.0
    3  8  1  3   2.0
    4  1  9  8   1.0
    
    sm.formula.ols(formula = 'y ~ xlag + v', data=df).fit()