Search code examples
time-seriesstatsmodelsarima

How to use multi previous step of exogenous variables in VARMAX model


I have a multi variables and multi time step prediction problem. Basically, I have a predicted target y, as a time series. And several exogenous variables x. They are also time series data.

I am hoping to use the 4 days lag to predict the future value of y. So basically is something like following. Please be note I also hope to use the x(t) data for the prediction.

y(t)=f(y(t-4), y(t-3), y(t-2), y(t_1), x(t-4), x(t-3), x(t-2), x(t-1), x(t))

But it looks like the VARMAX model from statsmodel only considers one time step of x?

Also, how could I predict multi step of y when the x is available? I think I should treat x as an exogenous variable.


Solution

    • To use 4 steps lags of y, the p-order should be 4 i.e. order = (4,0)
    • To use 4 steps lags of the exogenous variables x, you have to create all the lagged exogenous variables manually and pass them to the exog= parameter as one data frame.

    Let x be a pandas data frame containing all exogenous variables.
    We can create x_lag data frame that will be passed to exog= parameter using code bellow:

    x_list = []
    
    for q in range(5):
        x_lagged = x.shift(q).bfill()
        x_list.append(x_lagged)
        
    x_lag = pd.concat(x_list,axis=1)
    

    Finally, you fit and predict as follow :

    model = VARMAX(y, order=(4,0), exog=x_lag)
    model_fit = model.fit()
    # Suppose you have some later observations of the exogenous variables: x_lag_pred
    # and you want to forecast 10 steps ahead.
    model_fit.forecast(steps=10, exog=x_lag_pred.iloc[:10,:])