I am trying to create a seasonal ARIMA (SARIMA) model using pmdarima's AutoARIMA. The reason for that is that new data will become available over the lifetime of the project and code is required which automatically finds the best timeseries model. Unfortunately my current code seems to be producing garbage:
import pmdarima as pm
import pandas as pd
train_data = pd.read_csv("test.csv", header=None, names=["Value"])["Value"]
model = pm.AutoARIMA(seasonal=True, m=168, trace=True)
model.fit(train_data.fillna(0))
Output (so far, after quite some time on large server):
Performing stepwise search to minimize aic
ARIMA(2,1,2)(1,0,1)[168] intercept : AIC=inf, Time=4041.19 sec
ARIMA(0,1,0)(0,0,0)[168] intercept : AIC=-35451.160, Time=1.07 sec
ARIMA(1,1,0)(1,0,0)[168] intercept : AIC=inf, Time=15118.06 sec
ARIMA(0,1,1)(0,0,1)[168] intercept : AIC=-35951.886, Time=3805.77 sec
ARIMA(0,1,0)(0,0,0)[168] : AIC=-35453.123, Time=0.56 sec
ARIMA(0,1,1)(0,0,0)[168] intercept : AIC=-35723.198, Time=2.69 sec
ARIMA(0,1,1)(1,0,1)[168] intercept : AIC=inf, Time=61326.67 sec
ARIMA(0,1,1)(0,0,2)[168] intercept : AIC=inf, Time=39971.60 sec
ARIMA(0,1,1)(1,0,0)[168] intercept : AIC=-36054.745, Time=4211.60 sec
ARIMA(0,1,1)(2,0,0)[168] intercept : AIC=-36344.782, Time=30668.84 sec
The data has two seasonal patterns (one daily and one weekly). Including a daily pattern gives sensible results (using m=24
), but weekly tends to cause AIC=inf
as in the example above.
The issue seems to have been that pmdarima times out after some time and inserts an AIC of inf as a replacement for the non-calculated AIC. I ended up doing conventional analysis and going for a slightly oversized SARIMA model which takes longer to fit, but definitely includes all relevant effects.