Number of used observations for estimation when using R's 'arima' command

I observe some weird behavior with R's arima function. Maybe I am also missing something, thus I am asking for help.

If we want to compare two ARIMA models, let's say ARIMA(1,0,0) and ARIMA(2,0,0) with any information criteria, we have to ensure that the number of observations are the same across estimations (otherwise it would be an unfair comparison). I am providing a small example below.

wooldridge::data(consump)

mod1 = arima(consump$gc_1[4:nrow(consump)], order=c(1,0,0))
mod2 = arima(consump$gc_1[3:nrow(consump)], order=c(2,0,0))

I am loading dataset consump from the wooldridge package and estimate an AR(1) and AR(2) model. The dataset comprises 37 observations but the first two observations are NA for the variable gc_1. Hence, I am using only observations 4-37 (i.e., 34 observations) for the AR(1) model. Since I loose one observation, I should end up with 33 observations for the model, residuals, etc. Similarly, for the AR(2) model I use then observations 3-37 (i.e., 35 observations) but here I loose two observations and I should end up (again) with 33 observations for the model, residuals, etc.

However, somehow magically the arima command provides me with 34 and 35 observations, residuals, etc. I can check this via

mod1$nobs
[1] 34
mod2$nobs
[1] 35

Finally, the argument n.cond is zero in both cases.

mod1$n.cond
[1] 0
mod2$n.cond
[1] 0

I would have expected that this should be 1 and 2, respectively. Similarly, mod1$residuals and mod2$residuals contain 34 or 35 observations. I wonder how this is possible, shouldn't we loose one observations due to the lag structure in the AR(1) process?

Finally, the AIC command also tells me that something is wrong:

AIC(mod1,mod2)
     df       AIC
mod1  3 -201.3758
mod2  4 -206.6710
Warning message:
In AIC.default(mod1, mod2) :
  models are not all fitted to the same number of observations

I am a bit confused what the arima command is exactly doing. Maybe I am misunderstanding something or this is erroneous behavior.

Thanks a lot in advance.

Solution

The arima function uses full maximum likelihood estimation, not conditional likelihood or conditional least squares. So all non-missing observations can be used for a stationary model. There is no conditioning required.

If you set method = "CSS", you will get the n.cond results you are expecting.

The AIC command is simply telling you that the models have different numbers of observations, which they do because you didn't use the same data set.