I have created a timeseries SARIMAX model using the statsmodels library in python.
Currently, we are following the below pipeline:
Our overall aim is to only train the model once per week. This means that we want to produce forecasts for more than 1 step on from the date the model was trained. As we will be doing each prediction, after the actual data for the day before has been collected, we want to use the parameters from SARIMAX to produce a prediction. This step is fairly straightforward, you just need to use the coefficents and multiply them by the first level difference for the lag components.
However I am struggling to then recreate the confidence intervals for my predictions, does anyone have any idea on the maths that statsmodels uses to create these or the maths that I should be using?
Thanks in advance, James
Probably the easiest thing to do is to let Statsmodels compute them for you, but using the trained parameters with the new dataset. Here's an example:
First, train the model (your step 1) and save the parameters so that you can use them later
# 1. Train the model
mod = sm.tsa.SARIMAX(endog_train, order=(p, d, q))
res = mod.fit()
# Save the parameters for use later
res.params.to_csv('trained_params.csv')
Then create a new model with the updated dataset, but instead of re-training the parameters (which would use the fit
method), apply the old parameters to the new data (using the smooth
method).
# 2. Use the saved parameters with a new dataset
mod = sm.tsa.SARIMAX(endog_updated, order=(p, d, q))
params = pd.read_csv('trained_params.csv', index_col=[0])['0']
res = mod.smooth(params)
# Produce forecasts and 95% confidence intervals
fcast = res.get_forecast(steps=5, alpha=0.05)
print(fcast.summary_frame())
For the data I was using, this prints:
realgdp mean mean_se mean_ci_lower mean_ci_upper
2009Q4 1.682724 3.717418 -5.603282 8.968729
2010Q1 1.031580 4.360360 -7.514567 9.577728
2010Q2 0.632402 4.578709 -8.341702 9.606506
2010Q3 0.387689 4.658123 -8.742065 9.517443
2010Q4 0.237670 4.687621 -8.949899 9.425239
You can do this as many times as you want, and then when it's time to re-train the model, just go back to #1, above.