Search code examples
rtime-seriesforecasting

Strange behavior of auto.arima in R-package forecast


I am trying to use the R-package forecast to fit arima models (with the function Arima) and automatically select an appropriate model (with the function auto.arima). I first estimated two possible models with the function Arima:

tt.1 <- Arima(x, order=c(1,0,1), seasonal=list(order=c(0,1,1)), 
              include.drift=F)
tt.2 <- Arima(x, order=c(1,0,1), seasonal=list(order=c(0,1,0)),
              include.drift=F)

Then, I used the function auto.arima to automatically select an appropriate model for the same data. I fixed d=0 and D=1 just as in the two models above. Furthermore, I set the maximum to 1 for all other parameters, did not use approximation of the selection criterion and did not use stepwise selection (note that the settings I use here are only for demonstration of the strange behavior, not what I really intend to use). I used BIC as criterion for selection the model. Here is the function call:

tt.auto <- auto.arima(x, ic="bic", approximation=F, seasonal=T, stepwise=F, 
                  max.p=1, max.q=1, max.P=1, max.Q=1, d=0, D=1, start.p=1, 
                  start.q=1, start.P=1, start.Q=1, trace=T, 
                  allowdrift=F)

Now, I would have expected that auto.arima selects the model with the lower BIC from the two models above or a model not estimated above by Arima. Furthermore, I would have expected that the output generated by auto.arima when trace=T is exactly the same as the BIC calculated by Arima for the two models above. This is indeed true for the second model but not for the first one. For the first model, the BIC calculated by Arima is 10405.81 but the screen output of auto.arima for the model (1,0,1)(0,1,1) is Inf. Consequently, the second model is selected by auto.arima although the first model has a lower BIC when comparing the two models estimated by Arima. Does anyone have an idea why the BIC calculated by Arima does not correspond to the BIC calculated by auto.arima in case of the first model?

Here is the screen output of auto.arima:

 ARIMA(0,0,0)(0,1,0)[96]                    : 11744.63
 ARIMA(0,0,0)(0,1,1)[96]                    : Inf
 ARIMA(0,0,0)(1,1,0)[96]                    : Inf
 ARIMA(0,0,0)(1,1,1)[96]                    : Inf
 ARIMA(0,0,1)(0,1,0)[96]                    : 11404.67
 ARIMA(0,0,1)(0,1,1)[96]                    : Inf
 ARIMA(0,0,1)(1,1,0)[96]                    : Inf
 ARIMA(0,0,1)(1,1,1)[96]                    : Inf
 ARIMA(1,0,0)(0,1,0)[96]                    : 11120.72
 ARIMA(1,0,0)(0,1,1)[96]                    : Inf
 ARIMA(1,0,0)(1,1,0)[96]                    : Inf
 ARIMA(1,0,0)(1,1,1)[96]                    : Inf
 ARIMA(1,0,1)(0,1,0)[96]                    : 10984.75
 ARIMA(1,0,1)(0,1,1)[96]                    : Inf
 ARIMA(1,0,1)(1,1,0)[96]                    : Inf
 ARIMA(1,0,1)(1,1,1)[96]                    : Inf

And here are summaries of the models calculated by Arima:

> summary(tt.1)
Series: x 
ARIMA(1,0,1)(0,1,1)[96]                    

Coefficients:
         ar1      ma1     sma1
      0.9273  -0.5620  -1.0000
s.e.  0.0146   0.0309   0.0349

sigma^2 estimated as 867.7:  log likelihood=-5188.98
AIC=10385.96   AICc=10386   BIC=10405.81

Training set error measures:
                  ME     RMSE      MAE       MPE     MAPE      MASE        ACF1
Training set 0.205128 28.16286 11.14871 -7.171098 18.42883 0.3612059 -0.03466711
> summary(tt.2)
Series: x 
ARIMA(1,0,1)(0,1,0)[96]                    

Coefficients:
         ar1      ma1
      0.9148  -0.4967
s.e.  0.0155   0.0320

sigma^2 estimated as 1892:  log likelihood=-5481.93
AIC=10969.86   AICc=10969.89   BIC=10984.75

Training set error measures:
                ME     RMSE      MAE       MPE     MAPE    MASE        ACF1
Training set 0.1942746 41.61086 15.38138 -8.836059 24.55919 0.49834 -0.02253845

Note: I am not allowed to make the data available. But I would be happy to provide more output or run modified calls of the functions if necessary.

EDIT: I now looked at the source code of auto.arima and found out that the behavior is caused by a check on the roots which sets the information criterion used for selecting a model to Inf if the model fails the check. The paper cited in the help for auto.arima confirms that (Hyndman, R.J. and Khandakar, Y. (2008) "Automatic time series forecasting: The forecast package for R", Journal of Statistical Software, 26(3), page 11). Sorry for the question, I should have read the paper before asking a question here!


Solution

  • auto.arima tries to find the best model subject to some constraints, avoiding models with parameters that are close to the non-stationarity and non-invertibility boundaries.

    Your tt.1 model has a seasonal MA(1) parameter of -1 which lies on the non-invertibility boundary. So you don't want to use that model as it will lead to numerical instabilities. The seasonal difference operator is confounded with the seasonal MA operator.

    Internally, auto.arima gives an AIC/AICc/BIC value of Inf to any model that doesn't satisfy the constraints to avoid it being selected.