Search code examples
pythontime-seriesarima

Calculate and List Prediction Intervals for a Dataset Python Time Series Analysis


I am looking for a Python library or example that produces a set of prediction (not confidence, as I am predicting future values ) intervals for time series analysis. I have code that will read a CSV file containing two fields: a date field and a value field.

Date   Value
Dec-21    19.80
Jan-22    19.80
Feb-22    19.70
Mar-22    20.00
Apr-22    19.90
May-22    20.00
Jun-22    20.00
Jul-22    20.00
Aug-22    20.00
Sep-22    20.10
Oct-22    20.00
Nov-22    20.10
Dec-22    20.00
Jan-23    20.20
Feb-23    20.30
Mar-23    20.30
Apr-23    20.50
May-23    20.40
Jun-23    20.40
Jul-23    20.60
Aug-23    20.50
Sep-23    20.62
Oct-23    20.64
Nov-23    20.65
Dec-23    20.78
Jan-24    20.74
Feb-24    20.81
Mar-24    20.90
Apr-24    20.85
May-24    21.00
Jun-24    20.97
Jul-24    21.04
Aug-24    21.13
Sep-24    21.09
Oct-24    21.22
Nov-24    21.21
Dec-24    21.25

That code will run an ARIMA model and produce a series of confidence intervals:

import warnings
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter('ignore', ConvergenceWarning)
import warnings
warnings.filterwarnings("ignore")
from statsmodels.tsa.arima.model import ARIMA
from pandas import read_csv

 summarize multiple confidence intervals on an ARIMA forecast for Diverse  %
from pandas import read_csv
from statsmodels.tsa.arima.model import ARIMA
# load data
series = read_csv("C:\\mydirectory\\myfile.csv", header=0, index_col=0, parse_dates=True, squeeze=True)


# split data into train and test setes
X = series.values
X = X.astype('float32')
size = len(X) - 1
train, test = X[0:size], X[size:]
# fit an ARIMA model
model = ARIMA(train, order=(5,1,1))
model_fit = model.fit()
result = model_fit.get_forecast(maxiter=200)
forecast = result.predicted_mean
# summarize confidence intervals
intervals = [0.95,0.90,0.85,0.80,0.75,0.70,0.65, 0.60,0.55,0.50,0.45,0.40,0.35,0.30,0.25,0.2, 0.1, 0.05, 0.01]
for a in intervals:
 ci = result.conf_int(alpha=a)
 print('%.1f%% Confidence Interval: %.3f between %.3f and %.3f' % ((1-a)*100, forecast, ci[0,0], ci[0,1]))

...and it returns the confidence intervals:

5.0% Confidence Interval: 21.252 between 21.247 and 21.256
10.0% Confidence Interval: 21.252 between 21.243 and 21.260
15.0% Confidence Interval: 21.252 between 21.239 and 21.264
20.0% Confidence Interval: 21.252 between 21.234 and 21.269
25.0% Confidence Interval: 21.252 between 21.230 and 21.273
30.0% Confidence Interval: 21.252 between 21.225 and 21.278
35.0% Confidence Interval: 21.252 between 21.220 and 21.283
40.0% Confidence Interval: 21.252 between 21.216 and 21.287
45.0% Confidence Interval: 21.252 between 21.211 and 21.292
50.0% Confidence Interval: 21.252 between 21.205 and 21.298
55.0% Confidence Interval: 21.252 between 21.200 and 21.303
60.0% Confidence Interval: 21.252 between 21.194 and 21.309
65.0% Confidence Interval: 21.252 between 21.188 and 21.316
70.0% Confidence Interval: 21.252 between 21.181 and 21.323
75.0% Confidence Interval: 21.252 between 21.173 and 21.330
80.0% Confidence Interval: 21.252 between 21.164 and 21.339
90.0% Confidence Interval: 21.252 between 21.139 and 21.364
95.0% Confidence Interval: 21.252 between 21.117 and 21.386
99.0% Confidence Interval: 21.252 between 21.075 and 21.428

What I am looking for are the prediction intervals at 5%, 10%...90%.

I have tried running this updated code:

import warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA


# Load data
data = pd.read_csv("C:\\mydirectory\\myfile.csv", header=0, parse_dates=True, index_col=0)


# Split data into train and test sets
train_size = int(len(data) * 0.8)
train, test = data.iloc[:train_size], data.iloc[train_size:]


# Fit an ARIMA model
model = ARIMA(train, order=(5, 1, 1))
model_fit = model.fit()


# Forecast future values
n_forecast = len(test)
forecast, stderr, conf_int = model_fit.forecast(steps=n_forecast, alpha=0.05)


# Plot the actual vs. predicted values with prediction intervals
plt.figure(figsize=(12, 6))
plt.plot(train.index, train.values, label='Training Data', color='blue')
plt.plot(test.index, test.values, label='Actual Data', color='green')
plt.plot(test.index, forecast, label='Predicted Data', color='red')


# Fill prediction intervals
plt.fill_between(test.index, conf_int[:, 0], conf_int[:, 1], color='pink', alpha=0.3, label='Prediction Intervals')


plt.title('Time Series Forecast with Prediction Intervals')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend(loc='upper left')
plt.grid(True)
plt.show()


# Print prediction intervals
prediction_intervals = [0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95]


for alpha in prediction_intervals:
    z_score = model_fit.get_forecast(steps=n_forecast).zconfint(alpha=alpha)
    lower_bound = z_score[:, 0]
    upper_bound = z_score[:, 1]
    
    print('%.1f%% Prediction Interval: %.3f between %.3f and %.3f' % ((1 - alpha) * 100, forecast[0], lower_bound[0], upper_bound[0]))

which returns an error :

ValueError                                Traceback (most recent call last)
Cell In[27], line 3
      1 # Forecast future values
      2 n_forecast = len(test)
----> 3 forecast, stderr, conf_int = model_fit.forecast(steps=n_forecast, alpha=0.05)

ValueError: too many values to unpack (expected 3)

Please advise. Thanks.


Solution

  • Based on the documentation , the forecast method only returns a single NumPy array, Pandas Series, or Pandas DataFrame depending on the dimensions and inputs. Your code is expecting an iterable (tuple, numpy array, or other) with only 3 elements.