I'm doing an autoarima model which has been trained etc. I'm at the stage whereby I need to use the model to make some predictions (the model was trained using 5 years of data and I need to forecast for the next year).
The initial dataset was a simple time series dataset;
Volume
01-01-1995 345
.
.
.
31-12-2000 4783
Steps so far;
df_train = df[df.Date < "2019"]
df_test = df[df.Date >= "2019"]
exogenous_features = ["Consumption_mean_lag30", "Consumption_std_lag30",
"Consumption_mean_lag182", "Consumption_std_lag182",
"Consumption_mean_lag365", "Consumption_std_lag365",
"month", "week", "day", "day_of_week"]
model = auto_arima(df_train['Volume'], exogenous=df_train[exogenous_features], trace=True, error_action="ignore", suppress_warnings=True)
model.fit(df_train['Volume'], exogenous=df_train[exogenous_features])
forecast = model.predict(n_periods=len(df_test), exogenous=df_test[exogenous_features])
df_test["Forecast_ARIMAX"] = forecast
df_test[["Consumption", "Forecast_ARIMAX"]].plot(figsize=(14, 7))
from sklearn.metrics import mean_absolute_error, mean_squared_error
print("RMSE of Auto ARIMAX:", np.sqrt(mean_squared_error(df_test.Consumption, df_test.Forecast_ARIMAX)))
print("\nMAE of Auto ARIMAX:", mean_absolute_error(df_test.Consumption, df_test.Forecast_ARIMAX))
The above gives me a satisfactory model.
When I try to predict using the following;
model.predict(n_periods=365)
I keep getting the error;
ValueError: When an ARIMA is fit with an X array, it must also be provided one for predicting or updating observations.
I have tried to troubleshoot everything but can't seem to understand how to provide an 'X array' or what the error is telling me?
If anyone has any insights or can help in anyway I'd really appreciate it.
Thanks.
You trained your model with exogenous data, so you have your time series and additional data. When you make predictions, you have to provide the additional, exogenous data for the time frame you try to predict.
This is the correct way to generate predictions, by providing the exogenous data:
forecast = model.predict(n_periods=len(df_test), exogenous=df_test[exogenous_features])
Here you are missing the exogenous data, hence the error (X array should contain your exogenous_features):
model.predict(n_periods=365)
The point with exogenous data is that it may improve your model significiantly, but you need to know this data in advance to make predictions.