I have a multi variables and multi time step prediction problem. Basically, I have a predicted target y, as a time series. And several exogenous variables x. They are also time series data.
I am hoping to use the 4 days lag to predict the future value of y. So basically is something like following. Please be note I also hope to use the x(t) data for the prediction.
y(t)=f(y(t-4), y(t-3), y(t-2), y(t_1), x(t-4), x(t-3), x(t-2), x(t-1), x(t))
But it looks like the VARMAX model from statsmodel only considers one time step of x?
Also, how could I predict multi step of y when the x is available? I think I should treat x as an exogenous variable.
order = (4,0)
exog=
parameter as one data frame.Let x be a pandas data frame containing all exogenous variables.
We can create x_lag
data frame that will be passed to exog=
parameter using code bellow:
x_list = []
for q in range(5):
x_lagged = x.shift(q).bfill()
x_list.append(x_lagged)
x_lag = pd.concat(x_list,axis=1)
Finally, you fit and predict as follow :
model = VARMAX(y, order=(4,0), exog=x_lag)
model_fit = model.fit()
# Suppose you have some later observations of the exogenous variables: x_lag_pred
# and you want to forecast 10 steps ahead.
model_fit.forecast(steps=10, exog=x_lag_pred.iloc[:10,:])