I'm doing a project on data analysis with timeseries and forecasting. I have a dataframe which contains a lot of data from which I need to handle Covid cases
. The dataframe looks like that:
Covid cases Confirmed Infections Difference
date
2020-02-24 19 NaN
2020-02-25 0 -19.0
2020-02-26 0 0.0
2020-02-27 1 1.0
2020-02-28 2 1.0
... ... ...
2021-02-25 1502 -136.0
2021-02-26 1468 -34.0
2021-02-27 1474 6.0
2021-02-28 715 -759.0
2021-03-01 298 -417.0
In order to make a prediction I use the ARIMA model (dataframe is stationary) and after that I'm trying to apply a forecast line to my graph. I'm using some parameters for ARIMA and SARIMAX and then I'm printing the graph with pandas. The line is fitting the timeseries but it doesn't appear where the line ends.
Code:
def timeseries(dataframe, city_name):
cols = ['ID', 'name'] # Creating columns to be dropped
dataframe.drop(cols, axis=1, inplace=True) # Dropping columns that I don't need
dataframe.columns = ["date", "Covid cases"]
dataframe.describe()
dataframe.set_index('date', inplace=True)
dataframe.plot(figsize=(15, 6)) # Setting figure size
dataframe['Confirmed Infections Difference'] = dataframe['Covid cases'] - dataframe['Covid cases'].shift(1)
adfuller_test(dataframe['Confirmed Infections Difference'].dropna())
model = ARIMA(dataframe['Covid cases'], order=(1, 1, 1))
model_fit = model.fit(disp=0)
print(model_fit.summary())
dataframe['forecast'] = model_fit.predict(start=90, end=103, dynamic=True)
model = sm.tsa.statespace.SARIMAX(dataframe['Covid cases'], order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
results = model.fit()
dataframe['forecast'] = results.predict(start=90, end=103, dynamic=True)
future_dates = [dataframe.index[-1] + DateOffset(months=x) for x in range(0, 24)]
future_datest_df = pd.DataFrame(index=future_dates[1:], columns=dataframe.columns)
future_datest_df.tail()
future_df = pd.concat([dataframe, future_datest_df])
future_df['forecast'] = results.predict(start=104, end=120, dynamic=True)
future_df[['Covid cases', 'forecast']].plot(figsize=(12, 8))
Here is the result graph:
So as you can understand the forecast seems to not be applied correctly. I suppose it's a problem with some of the parameters I'm giving to ARIMA and SARIMAX.
An example of expected graph:
Reminder: date
column is about every single day. The forecast I want to be is for the next few days.
Any thoughts?
In several steps of your implementation, you are equalizing the column dataframe['forecast']
to the results of new calculations (besides predicting values two times for different models and concatenating dataframes with similarly named columns):
print(model_fit.summary())
dataframe['forecast'] = model_fit.predict(start=90, end=103, dynamic=True)
# ...
dataframe['forecast'] = results.predict(start=90, end=103, dynamic=True)
# ...
future_df = pd.concat([dataframe, future_datest_df])
future_df['forecast'] = results.predict(start=104, end=120, dynamic=True)
Please make sure that:
I cannot ensure because I don't have the full results of your code, but the error in the plot may come from some of these aspects...