Search code examples
pythonregressionstatsmodels

statsmodels RollingOLS: Using numpy arrays only, RollingOLS seems to ignore the x-axis variable


I am sure I must be doing something wrong, but here I go. I am trying to test the RollingOLS function in StatsModels against a known result, but I am getting a result that is unexpected.

I have generated some data in the range [0,2pi] for the function sin(t). I expected that if I take the rolling least squares of this data I should get data that approximates cos(t) due to d/dt( sin(t) ) = cos(t).

I want to use numpy arrays not Pandas dataframe as I don't want to require Pandas (even though it is great).

When I get the resulting slopes from RollingOLS the function is not approximating cos(t) at all and possibly seems to be ignoring the t variable when generating the x-axis.

Code:

import math
import numpy as np
from statsmodels.regression.rolling import RollingOLS

t = np.array(range(0,1001))/(1000)*2*math.pi
Y = np.sin(t)

window = 2

model = RollingOLS(Y,t,window=window)
results = model.fit()
results.plot_recursive_coefficient()

The result I am getting can easily be seen to not be cos(t): Plot from my code --> not look like cos(t)

Looking at the x-axis in the figure, it seems to be using the index of t (between 0 and 1000), not the t values themselves (between 0 and 2pi).

Please help me to figure out what I am doing wrong! TIA.


Solution

  • Ok, I'm embarressed, but I forgot to add the constant... The code below gives the expected output

    import math
    import numpy as np
    from statsmodels.regression.rolling import RollingOLS
    import statsmodels.api as sm
    
    t = np.array(range(0,1001))/(1000)*2*math.pi
    Y = np.sin(t)
    
    window = 2
    
    t = sm.add_constant(t, prepend=True) # add constant as the first column
    model = RollingOLS(Y,t,window=window)
    results = model.fit()
    results.plot_recursive_coefficient()
    

    Resulting in... cos(x)!