Search code examples
pythonpandasstatsmodels

ARMA model function for future unseen data with start and end dates?


I have a dataframe like this

lstvals = [30.81,27.16,82.15,31.00,9.13,11.77,25.58,7.57,7.98,7.98]

lstdates = ['2021-01-01', '2021-01-05', '2021-01-09', '2021-01-13', '2021-01-17', '2021-01-21', '2021-01-25', '2021-01-29', '2021-02-02', '2021-02-06']

data = {

    "Dates": lstdates,

    "Market Value": lstvals

}

df = pd.DataFrame(data)
df.set_index('Dates', inplace = True)
df

I want to forecast the values which are out of this sample, for example, from '2021-02-10' to '2022-04-23' (in my dataset, I have data from '2021-01-01' to '2023-11-09', and want to forecast for next year, from '2024-01-01' to '2023-11-09)

https://www.statsmodels.org/devel/examples/notebooks/generated/statespace_forecasting.html

I have defined and fitted my model as follows, which predicts the test data:


train = df['Market Value'].iloc[:1187]
test = df['Market Value'].iloc[-200:]
...
ARMAmodel = SARIMAX(y, order = (2,1,2))
ARMAResults = ARMAmodel.fit()

...
y_pred = ARMAResults.get_forecast(len(test.index))
y_pred_df = y_pred.conf_int(alpha = 0.05) 
y_pred_df["Predictions"] = ARMAResults.predict(start = y_pred_df.index[0], end = y_pred_df.index[-1])
y_pred_df.index = test.index
y_pred_out = y_pred_df["Predictions"] 

...
plt.plot(train, color = "black")
plt.plot(test, color = "red")
plt.ylabel('Market Value ($M)')
plt.xlabel('Date')
plt.xticks(rotation=45)
plt.title("Train/Test/Prediction for Market Data")
plt.plot(y_pred_out, color='green', label = 'Predictions')
plt.legend()
plt.show()

enter image description here

How can I make predictions for future dates?

I have just tried to input future dates with the forecast method, and apparently, it is not working for me

ARMAResults.forecast(start = '2024-01-01', end = '2024-11-09')

TypeError: statsmodels.tsa.statespace.mlemodel.MLEResults.predict() got multiple values for keyword argument 'start'

https://www.statsmodels.org/devel/examples/notebooks/generated/statespace_forecasting.html


Solution

  • Issues:

    1. Specify the frequency for the dates index:
      df = pd.DataFrame(data)
      df['Dates'] = pd.to_datetime(df['Dates'])
      df.set_index('Dates', inplace=True)
      df = df.asfreq('4D')
      
    2. forecast is strictly for out-of-sample forecasts, and has no start or end parameters. (Note that its steps parameter can be passed a string or datetime type.) Either use predict or get_prediction, which support both in-sample and out-of-sample results.
      ARMAResults.predict(start='2024-01-01', end='2024-11-09')
      
      or
      # mean
      ARMAResults.get_prediction(start='2024-01-01', end='2024-11-09').predicted_mean 
      
      # mean, standard error, prediction interval
      ARMAResults.get_prediction(start='2024-01-01', end='2024-11-09').summary_frame()