i am trying to predict the next values in a time series using the ARIMA model. Here is my code:(sorry for the typos)
split_val = floor(len(data_file)*0.8)
train = data_file[["Daily Confirmed"]][:split_val]
tesst = data_file[["Daily Confirmed"]][split_val:]
print(train.head())
print(tesst.head())
p = d = q = range(1, 5)
pdq = list(itertools.product(p, d, q))
# print(pdq)
bestvalues = {}
for i in pdq:
try:
p, d, q = i
moodel = ARIMA(train, order=(p, d, q))
trained_model = moodel.fit()
bestvalues[trained_model.aic] = i
print(trained_model.aic, " ", i)
except:
continue
print(bestvalues)
minaic = min(bestvalues.keys())
moodel = ARIMA(train, order=bestvalues[minaic])
trained_model = moodel.fit()
pridiction = trained_model.forecast(steps=len(tesst))[0]
comparisionn = tesst.copy()
comparisionn["forcastted"] = pridiction.tolist()
comparisionn.plot()
print(comparisionn)
print(trained_model.aic)
plt.show()
(the data is pre-processed)
The minimum aic i can get is2145.930883796257
and here are the predictions against the test data (only first 5):
Daily Confirmed forcastted
Date
2020-06-22 13560 15048.987970
2020-06-23 15656 15349.247935
2020-06-24 16868 15905.260648
2020-06-25 18205 16137.086959
2020-06-26 18255 16237.232886
and here is the plot
as you can see, the prediction is not accurate and i have brute forced all the values for p, d and q upto 4....
What might be the problem? Thanks.
You should get better results if you update your model "daily". Your model hasn't seen any data after July 21st while it may be August 14th. ARIMA may struggle predicting 20-30 steps ahead. Instead - try forecasting step by step, like this:
history_endog = list(train.copy(deep=True))
y_true = []
y_pred = []
for obs in test:
model = ARIMA(endog=history_endog, order=(p,d,q))
model_fit = model.fit()
forecast = model_fit.forecast()[0]
y_true.append(obs)
y_pred.append(forecast)
history_endog.append(obs)
Then plot y_true
and y_pred
, your results should improve. Code sample above uses lists for the sake of simplicity.