I'm working on a time series forescast model with pmdarima
.
My time series is short, but not so bad behaved. The following code gives an error on sklearn\utils\validation.py
from pmdarima import auto_arima
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
import datetime
import pandas as pd
datelist = pd.date_range('2018-01-01', periods=24, freq='MS')
sales = [26.000000,27.100000,26.000000,28.014286,28.057143,
30.128571,39.800000,33.000000,37.971429,45.914286,
37.942857,33.885714,36.285714,34.971429,40.042857,
27.157143,30.685714,35.585714,43.400000,51.357143,
45.628571,49.942857,42.028571,52.714286]
df = pd.DataFrame(data=sales,index=datelist,columns=['sales'])
observations = df['sales']
size = df['sales'].size
shape = df['sales'].shape
maxdate = max(df.index).strftime("%Y-%m-%d")
mindate = min(df.index).strftime("%Y-%m-%d")
asc = seasonal_decompose(df, model='add')
if asc.seasonal[asc.seasonal.notnull()].size == df['sales'].size:
seasonality = True
else:
seasonality = False
# Check Stationarity
aftest = adfuller(df['sales'])
if aftest[1] <= 0.05:
stationarity = True
else:
stationarity = False
results = auto_arima(observations,
seasonal=seasonality,
stationary=stationarity,
m=12,
error_action="ignore")
~\AppData\Roaming\Python\Python37\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
584 " minimum of %d is required%s."
585 % (n_samples, array.shape, ensure_min_samples,
--> 586 context))
587
588 if ensure_min_features > 0 and array.ndim == 2:
ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.
However, if I change the first value of the sales series from 26 to 30 it works.
What could be wrong here?
Your example is not reproducible as currently seasonality
and stationarity
are not defined in the global scope. That leads to auto_arima
throwing an error of the form
NameError: name 'seasonality' is not defined
You have only few observations, so try explicitly setting the min/max order values for the different ARIMA processes. IMO, this is generally good practice. In your case we can do
fit = auto_arima(
observations,
start_p = 0, start_q = 0, start_P = 0, start_Q = 0,
max_p = 3, max_q = 3, max_P = 3, max_Q = 3,
D = 1, max_D = 2, m = 12,
seasonal = True,
error_action = 'ignore')
Here we consider processes up to MA(3) and AR(3), as well as SMA(3) and SAR(3).
Let's visualise the original time series data including the forecast
n_ahead = 10
preds, conf_int = fit.predict(n_periods = n_ahead, return_conf_int = True)
xrange = pd.date_range(min(datelist), periods = 24 + n_ahead, freq = 'MS')
import matplotlib.pyplot as plt
import matplotlib.dates as dates
fig = plt.figure()
plt.plot(xrange[:df.shape[0]], df["sales"])
plt.plot(xrange[df.shape[0]:], preds)
plt.fill_between(
xrange[df.shape[0]:],
conf_int[:, 0], conf_int[:, 1],
alpha = 0.1, color = 'b')
plt.show()