Search code examples
pythontime-seriesstatsmodelsforecasting

How do you model retail sales with variable seasonality around holidays?


I am using statsmodels in python to forecast weekly retail sales with the Walmart kaggle dataset. I am having trouble achieving stationarity before I run it through a SARIMA. The problem is that Easter can be weeks apart from year-to-year. How can you model for these fluctuations in holidays?

I have tried doing a gridsearch for the best (p,d,q)(P,D,Q)m parameters. My gridsearch returned a SARIMA of (0,1,0)(0,2,0)52 with an AIC of 832 but when plotted it is obviously terribly skewed (which is expected since my data never actually achieved stationarity with those transformations).

Does anyone have advice for using SARIMAX with retail seasonality? I know the R package is superior but I don't know R and I am hoping I can solve this without it.

p, d, q = 0, 1, 0
P, D, Q, m = 0, 2, 0, 52

model = SARIMAX(train11.Weekly_Sales.asfreq('W-FRI'), order=(p,d,q), seasonal_order=(P,D,Q,m),
                trend='n', enforce_stationarity=False, enforce_invertibility=False)
model_fit = model.fit()
model_fit.summary()
==========================================================================================
Dep. Variable:                       Weekly_Sales   No. Observations:                  143
Model:             SARIMAX(0, 1, 0)x(0, 2, 0, 52)   Log Likelihood                -415.101
Date:                            Tue, 02 Apr 2019   AIC                            832.202
Time:                                    21:48:24   BIC                            833.813
Sample:                                02-05-2010   HQIC                           832.770
                                     - 10-26-2012                                         
Covariance Type:                              opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
sigma2      2.202e+08   1.77e+07     12.406      0.000    1.85e+08    2.55e+08
===================================================================================
Ljung-Box (Q):                       28.96   Jarque-Bera (JB):                77.77
Prob(Q):                              0.79   Prob(JB):                         0.00
Heteroskedasticity (H):               0.00   Skew:                            -1.44
Prob(H) (two-sided):                  0.00   Kurtosis:                         9.49
===================================================================================

Solution

  • The easiest is to use dummy variables for holidays and special events. SARIMAX allows the specification of other explanatory variable in exog.

    The dummy variables can either be for specific days if there are enough years to estimate for example a Easter effect, or several holidays could be combined in the same dummy variable, for example several weekends before Christmas when shopping is much higher than usual.

    SARIMA itself will not be able to capture effects like Easter because even with one year seasonality the holiday would not have a regular cycle length.