Search code examples
pythonstatsmodelsrolling-computation

Puzzles with rolling windows for statsmodels RollingOLS


I am quite confused with rolling windows for statsmodels RollingOLS which is described in the RollingOLS example page. It mentions that

Estimated values are aligned so that models estimated using data points t, (t+1), ..., (t + windows) are stored in location (t + windows).

I have some questions:

Q1: Assume we are at row t, if I set RollingOLS(endog, exog, window=60), it estimates the model using data from [t - 60, t] (i.e. (t - 60), (t - 59), ..., t) which has 61 observations, right? But this is an window of 61 days.

Q2: If we use model.params to extract the estimated coefficients, the coefficients at row t is the OLS results using data from [t - 60, t] (i.e. (t - 60), (t - 59), ..., t), am I right?

Q3: If my guess in Q2 is right, how do we solve the rolling problem mentioned here? That is

I want to run a rolling 100-day window OLS regression estimation, which is:

First for the 101st row, I run a regression of Y-X1,X2,X3 using the 1st to 100th rows, and estimate Y for the 101st row;

Then for the 102nd row, I run a regression of Y-X1,X2,X3 using the 2nd to 101st rows, and estimate Y for the 102nd row;

Then for the 103rd row, I run a regression of Y-X1,X2,X3 using the 2nd to 101st rows, and estimate Y for the 103rd row;

Until the last row.


Solution

  • There is a small typo in the example. When window is n, the first value computed uses observations 0,1,...,n-1 and appears in res.params[n-1]. This is how it should be so that the window size is actually enforces. You can see this here

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    import pandas_datareader as pdr
    import seaborn
    
    import statsmodels.api as sm
    from statsmodels.regression.rolling import RollingOLS
    
    factors = pdr.get_data_famafrench("F-F_Research_Data_Factors", start="1-1-1926")[0]
    industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
    
    endog = industries.HiTec - factors.RF.values
    exog = sm.add_constant(factors["Mkt-RF"])
    rols = RollingOLS(endog, exog, window=60)
    rres = rols.fit()
    params = rres.params.copy()
    params.index = np.arange(1, params.shape[0] + 1)
    params.iloc[57:62]
    

    Note that the index here is 1, 2, ..., so that the first is in position 60, indicating that 60 observations where used to compute the first estimate.

           const    Mkt-RF
    58       NaN       NaN
    59       NaN       NaN
    60  0.876155  1.399240
    61  0.879936  1.406578
    62  0.953169  1.408826
    

    To get the estimate using data up to and including any point t, use res.params[t].

    shift

    If you want to observations to be aligned "out-of-sample" so that the parameter estimates using observations up-to-and-including observation t are aligned to t+1, you can use shift

    params.shift(1).iloc[57:62]
    

    You see the parameter values that were at 60 are now at 61.