I want to find an autoregressive model on some data stored in a dataframe and I have 96 data points per day. The data is the value of solar irradiance in some region and I know it has a 1-day seasonality. I want to obtain a simple linear model using scikit LinearRegression and I want to specify which lagged data points to use. I would like to use the last 10 data points, plus the data point that has a lag of 97, which corresponds to the data point of 24 hour earlier. How can I specify the lagged coefficients that I want to use? I don't want to have 97 coefficients, I just want to use 11 of them: the previous 10 data points and the data point 97 positions back.
Just make a dataset X
with 11 columns [x0-97, x0-10, x0-9,...,x0-1]
. Then series of x0
will be your target Y
.