Search code examples
pythonstatsmodelsholtwinters

Error when trying to implement Holt-Winters Exponential Smoothing in Pyspark


I am trying to perform Holt-Winters Exponential Smoothing on my dataset FinalModel which has Date as an index and Crimecount column in addition to other columns. I only want to forecast the CrimeCount column but I am getting the following error:

ValueError: Buffer dtype mismatch, expected 'double' but got 'long long'

My code:

df = FinalModel.copy()
train, test = FinalModel.iloc[:85, 18], df.iloc[85:, 18]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.holtwinters import ExponentialSmoothing

df.index.freq = 'MS'
model = ExponentialSmoothing(train.astype(np.int64), seasonal='mul', seasonal_periods=12).fit()
pred = model.predict(start=test.index[0], end=test.index[-1])
plt.plot(train.index, train, label='Train')
plt.plot(test.index, test, label='Test')
plt.plot(pred.index, pred, label='Holt-Winters')
plt.legend(loc='best')

Solution

  • The error says that the input values should be doubles, but instead long types were received. Forcing the input values to be numpy floats instead of numpy ints will do the trick:

    df = FinalModel.copy()
    train, test = FinalModel.iloc[:85, 18], df.iloc[85:, 18]
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    from statsmodels.tsa.holtwinters import ExponentialSmoothing
    
    df.index.freq = 'MS'
    model = ExponentialSmoothing(train.astype('<f8'), seasonal='mul', seasonal_periods=12).fit()
    pred = model.predict(start=test.index[0], end=test.index[-1])
    plt.plot(train.index, train, label='Train')
    plt.plot(test.index, test, label='Test')
    plt.plot(pred.index, pred, label='Holt-Winters')
    plt.legend(loc='best')
    

    Usually most statistical models both from statsmodels and sklearn assume the input values are floats. Most of these methods do the conversion automatically for you, but it seems that the ExponentialSmoothing does not. Nevertheless it is a good habit to cast the input values to floats for consistency.