Search code examples
python-3.xtensorflowkerasarimapmdarima

Forecasting/prediction using ARIMA in python - how does it work?


I am very confused about how to predict/forecast using ARIMA.

Lets assume we have a series called y_orig that we split into y_train and y_test. Assuming that y_orig is not stationary, we could fit ARIMA using the code below

# fit ARIMA model
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(y_train, order=(2,1,2))
model_fit = model.fit(disp=0)
print(model_fit.summary())

After fitting the model, we can predict using the code below

n_periods = len(`y_test`)

fc, -, - = model_fit.forecast(n_periods, alpha=0.05)  # 95% conf

The value fc should give a forecast which i then compare to y_test. Please note that as expected, y_test is not used in the training phase. Also note that i am not looking for a rolling forecast but for a long term forecast where the parameters (once trained) are fixed.

I am very confused because y_test is not used at all in the forecasting phase.
For instance, if we were to use other prediction models (like in Keras or tensorflow). we would be coding it that way.

First, we fit the model in the training phase which i dont show- it does not matter for my question. Then we predict and see how good our fit is in sample using the code below.

y_pred_train=model.predict(y_train)

then we test the model out of sample as below:

y_pred_test=model.predict(y_test)

In this situation, the parameters are not re-estimated and y_test is used in the testing phase to forecast the next value (with fixed parameters).

Hence my confusion with ARIMA. Why do we not do the same with ARIMA model?

Please help me understand as i am very confused.

Thanks so much!!


Solution

  • I think you're a bit confused by the .fit and the y_train in the ARIMA code block. y_train is just a poorly named variable here, it should just be y, the data I want to forecast. The ARIMA model has no training/test phase, it's not self-learning. It does a statistical analysis of the input data, and does a forecast. If you want to do another forecast (on y_test), you need to do another statistical analysis (using model.fit) and do another forecast (using model.forecast). The ARIMA model does not have any weights it trains in a training phase, nothing related to any previous data 'fitted' on is saved in the model. You can't use a "fitted" ARIMA model to forecast other data samples.