I observe some weird behavior with R's arima
function. Maybe I am also missing something, thus I am asking for help.
If we want to compare two ARIMA models, let's say ARIMA(1,0,0) and ARIMA(2,0,0) with any information criteria, we have to ensure that the number of observations are the same across estimations (otherwise it would be an unfair comparison). I am providing a small example below.
wooldridge::data(consump)
mod1 = arima(consump$gc_1[4:nrow(consump)], order=c(1,0,0))
mod2 = arima(consump$gc_1[3:nrow(consump)], order=c(2,0,0))
I am loading dataset consump
from the wooldridge package and estimate an AR(1) and AR(2) model. The dataset comprises 37 observations but the first two observations are NA
for the variable gc_1
. Hence, I am using only observations 4-37 (i.e., 34 observations) for the AR(1) model. Since I loose one observation, I should end up with 33 observations for the model, residuals, etc.
Similarly, for the AR(2) model I use then observations 3-37 (i.e., 35 observations) but here I loose two observations and I should end up (again) with 33 observations for the model, residuals, etc.
However, somehow magically the arima
command provides me with 34 and 35 observations, residuals, etc. I can check this via
mod1$nobs
[1] 34
mod2$nobs
[1] 35
Finally, the argument n.cond
is zero in both cases.
mod1$n.cond
[1] 0
mod2$n.cond
[1] 0
I would have expected that this should be 1 and 2, respectively. Similarly, mod1$residuals
and mod2$residuals
contain 34 or 35 observations. I wonder how this is possible, shouldn't we loose one observations due to the lag structure in the AR(1) process?
Finally, the AIC command also tells me that something is wrong:
AIC(mod1,mod2)
df AIC
mod1 3 -201.3758
mod2 4 -206.6710
Warning message:
In AIC.default(mod1, mod2) :
models are not all fitted to the same number of observations
I am a bit confused what the arima
command is exactly doing. Maybe I am misunderstanding something or this is erroneous behavior.
Thanks a lot in advance.
The arima
function uses full maximum likelihood estimation, not conditional likelihood or conditional least squares. So all non-missing observations can be used for a stationary model. There is no conditioning required.
If you set method = "CSS"
, you will get the n.cond
results you are expecting.
The AIC
command is simply telling you that the models have different numbers of observations, which they do because you didn't use the same data set.