python time-series data-science statsmodels arima

ARIMA model not accurate prediction

i am trying to predict the next values in a time series using the ARIMA model. Here is my code:(sorry for the typos)

split_val = floor(len(data_file)*0.8)
train = data_file[["Daily Confirmed"]][:split_val]
tesst = data_file[["Daily Confirmed"]][split_val:]

print(train.head())
print(tesst.head())

p = d = q = range(1, 5)

pdq = list(itertools.product(p, d, q))
# print(pdq)
bestvalues = {}
for i in pdq:
    try:
        p, d, q = i
        moodel = ARIMA(train, order=(p, d, q))
        trained_model = moodel.fit()
        bestvalues[trained_model.aic] = i
        print(trained_model.aic, " ", i)
    except:
        continue

print(bestvalues)
minaic = min(bestvalues.keys())


moodel = ARIMA(train, order=bestvalues[minaic])
trained_model = moodel.fit()

pridiction = trained_model.forecast(steps=len(tesst))[0]

comparisionn = tesst.copy()

comparisionn["forcastted"] = pridiction.tolist()
comparisionn.plot()

print(comparisionn)
print(trained_model.aic)
plt.show()

(the data is pre-processed)

The minimum aic i can get is2145.930883796257 and here are the predictions against the test data (only first 5):

            Daily Confirmed    forcastted
Date                                     
2020-06-22            13560  15048.987970
2020-06-23            15656  15349.247935
2020-06-24            16868  15905.260648
2020-06-25            18205  16137.086959
2020-06-26            18255  16237.232886

and here is the plot

as you can see, the prediction is not accurate and i have brute forced all the values for p, d and q upto 4....

What might be the problem? Thanks.

Solution

You should get better results if you update your model "daily". Your model hasn't seen any data after July 21st while it may be August 14th. ARIMA may struggle predicting 20-30 steps ahead. Instead - try forecasting step by step, like this:

history_endog = list(train.copy(deep=True))
y_true = []
y_pred = []

for obs in test: 
    model = ARIMA(endog=history_endog, order=(p,d,q))
    model_fit = model.fit()
    forecast = model_fit.forecast()[0]

    y_true.append(obs)
    y_pred.append(forecast)
    history_endog.append(obs)

Then plot y_true and y_pred, your results should improve. Code sample above uses lists for the sake of simplicity.