Search code examples
pythonstatsmodelsforecastingconfidence-intervalholtwinters

How to take confidence interval of statsmodels.tsa.holtwinters-ExponentialSmoothing Models in python?


I did time series forecasting analysis with ExponentialSmoothing in python. I used statsmodels.tsa.holtwinters.

model = ExponentialSmoothing(df, seasonal='mul', seasonal_periods=12).fit()
pred = model.predict(start=df.index[0], end=122)

plt.plot(df_fc.index, df_fc, label='Train')
plt.plot(pred.index, pred, label='Holt-Winters')
plt.legend(loc='best')

plot

I want to take confidence interval of the model result. But I couldn't find any function about this in "statsmodels.tsa.holtwinters - ExponentialSmoothing". How to I do that?


Solution

  • From this answer from a GitHub issue, it is clear that you should be using the new ETSModel class, and not the old (but still present for compatibility) ExponentialSmoothing. ETSModel includes more parameters and more functionality than ExponentialSmoothing.

    To calculate confidence intervals, I suggest you to use the simulate method of ETSResults:

    from statsmodels.tsa.exponential_smoothing.ets import ETSModel
    import pandas as pd
    
    
    # Build model.
    ets_model = ETSModel(
        endog=y, # y should be a pd.Series
        seasonal='mul',
        seasonal_periods=12,
    )
    ets_result = ets_model.fit()
    
    # Simulate predictions.
    n_steps_prediction = y.shape[0]
    n_repetitions = 500
    
    df_simul = ets_result.simulate(
        nsimulations=n_steps_prediction,
        repetitions=n_repetitions,
        anchor='start',
    )
    
    # Calculate confidence intervals.
    upper_ci = df_simul.quantile(q=0.9, axis='columns')
    lower_ci = df_simul.quantile(q=0.1, axis='columns')
    

    Basically, calling the simulate method you get a DataFrame with n_repetitions columns, and with n_steps_prediction steps (in this case, the same number of items in your training data-set y). Then, you calculate the confidence intervals with DataFrame quantile method (remember the axis='columns' option). You could also calculate other statistics from the df_simul.

    I also checked the source code: simulate is internally called by the forecast method to predict steps in the future. So, you could also predict steps in the future and their confidence intervals with the same approach: just use anchor='end', so that the simulations will start from the last step in y.

    To be fair, there is also a more direct approach to calculate the confidence intervals: the get_prediction method (which uses simulate internally). But I do not really like its interface, it is not flexible enough for me, I did not find a way to specify the desired confidence intervals. The approach with the simulate method is pretty easy to understand, and very flexible, in my opinion.

    If you want further details on how this kind of simulations are performed, read this chapter from the excellent Forecasting: Principles and Practice online book.