Search code examples
pythonpandasstatsmodelsforecastingholtwinters

How to predict a time series set with statsmodels Holt-Winters


I have a set of data from January 2012 to December 2014 that show some trend and seasonality. I want to make a prediction for the next 2 years (from January 2015 to December 2017), by using the Holt-Winters method from statsmodels. The data set is the following one:

date,Data
Jan-12,153046
Feb-12,161874
Mar-12,226134
Apr-12,171871
May-12,191416
Jun-12,230926
Jul-12,147518
Aug-12,107449
Sep-12,170645
Oct-12,176492
Nov-12,180005
Dec-12,193372
Jan-13,156846
Feb-13,168893
Mar-13,231103
Apr-13,187390
May-13,191702
Jun-13,252216
Jul-13,175392
Aug-13,150390
Sep-13,148750
Oct-13,173798
Nov-13,171611
Dec-13,165390
Jan-14,155079
Feb-14,172438
Mar-14,225818
Apr-14,188195
May-14,193948
Jun-14,230964
Jul-14,172225
Aug-14,129257
Sep-14,173443
Oct-14,188987
Nov-14,172731
Dec-14,211194

Which looks like follows:

enter image description here

I'm trying to build the Holt-Winters model, in order to improve the prediction performance of the past data (it means, a new graph where I can see if my parameters perform a good prediction of the past) and later on forecast the next years. I made the prediction with the following code, but I'm not able to do the forecast.

# Data loading
data = pd.read_csv('setpoints.csv', parse_dates=['date'], index_col=['date'])
df_data = pd.DataFrame(datos_matric, columns=['Data'])

df_data['Data'].index.freq = 'MS'
train, test = df_data['Data'], df_data['Data']
model = ExponentialSmoothing(train, trend='add', seasonal='add', seasonal_periods=12).fit()
period = ['Jan-12', 'Dec-14']
pred = model.predict(start=period[0], end=period[1])


df_data['Data'].plot(label='Train')
test.plot(label='Test')
pred.plot(label='Holt-Winters')
plt.legend(loc='best')
plt.show()

Which looks like:

enter image description here

Does anyone now how to forecast it?


Solution

  • I think you are making a misconception here. You shouldnt use the same data for train and test. The test data are datapoints which your model "has not seen yet". This way you can test how well your model is performing. So I used the last three months of your data as test.

    As for the prediction, we can use different start and end points.

    Also notice I used mul as seasonal component, which performs better on your data:

    # read in data and convert date column to MS frequency
    df = pd.read_csv(data)
    df['date'] = pd.to_datetime(df['date'], format='%b-%y')
    df = df.set_index('date').asfreq('MS')
    
    # split data in train, test
    train = df.loc[:'2014-09-01']
    test = df.loc['2014-10-01':]
    
    # train model and predict
    model = ExponentialSmoothing(train, seasonal='mul', seasonal_periods=12).fit()
    #model = ExponentialSmoothing(train, trend='add', seasonal='add', seasonal_periods=12).fit()
    pred_test = model.predict(start='2014-10-01', end='2014-12-01')
    pred_forecast = model.predict(start='2015-01-01', end='2017-12-01')
    
    # plot data and prediction
    df.plot(figsize=(15,9), label='Train')
    pred_test.plot(label='Test')
    pred_forecast.plot(label='Forecast')
    plt.legend()
    plt.show()
    plt.savefig('figure.png')
    

    figure