Search code examples
pythonpandasdataframeforecast

Parameters of ARIMA and SARIMAX


I'm doing a project on data analysis with timeseries and forecasting. I have a dataframe which contains a lot of data from which I need to handle Covid cases. The dataframe looks like that:

            Covid cases  Confirmed Infections Difference
date                                                    
2020-02-24           19                              NaN
2020-02-25            0                            -19.0
2020-02-26            0                              0.0
2020-02-27            1                              1.0
2020-02-28            2                              1.0
...                 ...                              ...
2021-02-25         1502                           -136.0
2021-02-26         1468                            -34.0
2021-02-27         1474                              6.0
2021-02-28          715                           -759.0
2021-03-01          298                           -417.0

In order to make a prediction I use the ARIMA model (dataframe is stationary) and after that I'm trying to apply a forecast line to my graph. I'm using some parameters for ARIMA and SARIMAX and then I'm printing the graph with pandas. The line is fitting the timeseries but it doesn't appear where the line ends.

Code:

def timeseries(dataframe, city_name):
    cols = ['ID', 'name']  # Creating columns to be dropped
    dataframe.drop(cols, axis=1, inplace=True)  # Dropping columns that I don't need
    dataframe.columns = ["date", "Covid cases"]
    dataframe.describe()
    dataframe.set_index('date', inplace=True)
    dataframe.plot(figsize=(15, 6))  # Setting figure size
    dataframe['Confirmed Infections Difference'] = dataframe['Covid cases'] - dataframe['Covid cases'].shift(1)
    adfuller_test(dataframe['Confirmed Infections Difference'].dropna())
    model = ARIMA(dataframe['Covid cases'], order=(1, 1, 1))
    model_fit = model.fit(disp=0)
    print(model_fit.summary())
    dataframe['forecast'] = model_fit.predict(start=90, end=103, dynamic=True)
    model = sm.tsa.statespace.SARIMAX(dataframe['Covid cases'], order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
    results = model.fit()
    dataframe['forecast'] = results.predict(start=90, end=103, dynamic=True)
    future_dates = [dataframe.index[-1] + DateOffset(months=x) for x in range(0, 24)]
    future_datest_df = pd.DataFrame(index=future_dates[1:], columns=dataframe.columns)

    future_datest_df.tail()

    future_df = pd.concat([dataframe, future_datest_df])

    future_df['forecast'] = results.predict(start=104, end=120, dynamic=True)
    future_df[['Covid cases', 'forecast']].plot(figsize=(12, 8))

Here is the result graph:

enter image description here

So as you can understand the forecast seems to not be applied correctly. I suppose it's a problem with some of the parameters I'm giving to ARIMA and SARIMAX.

An example of expected graph:

enter image description here

Reminder: date column is about every single day. The forecast I want to be is for the next few days.

Any thoughts?


Solution

  • In several steps of your implementation, you are equalizing the column dataframe['forecast'] to the results of new calculations (besides predicting values two times for different models and concatenating dataframes with similarly named columns):

    print(model_fit.summary())
    dataframe['forecast'] = model_fit.predict(start=90, end=103, dynamic=True)
    
    # ...
    
    dataframe['forecast'] = results.predict(start=90, end=103, dynamic=True)
    
    # ...
    
    future_df = pd.concat([dataframe, future_datest_df])
    
    future_df['forecast'] = results.predict(start=104, end=120, dynamic=True)
    

    Please make sure that:

    • You are not fully replacing the column values with the equalizations, instead of appending new dataframe entries;
    • You are getting the right columns to plot at the end, because of the columns with similar name.

    I cannot ensure because I don't have the full results of your code, but the error in the plot may come from some of these aspects...