Getting the AR(p) values from ARMA model in R

I am interested to get the values of the AR(p) part from any ARIMA model in R.

For example, suppose I estimate the following ARIMA model

$y_{t} = \rho_{1} y_{t-1} + \rho_{2} y_{t-2} + \epsilon_{t} + \phi \epsilon_{t-1}$

I am interested to get these values defined as:

$\rho_{1} y_{t-1} + \rho_{2} y_{t-2} = y_{t} - \epsilon_{t} - \phi \epsilon_{t-1}$

What I do is this:

  ar_model <- auto.arima(mydata) # fit an ARIMA model as decided by the auto.arima
  fitted_ar_model <- fitted(ar_model)
  resid_ar_model <- residuals(ar_model)
  ar_p <- fitted_ar_model - resid_ar_model

The thing with code is that I am not sure whether the residuals(ar_model) include only the contemporaneous residuals or also the moving average part.

Solution

I will attempt an answer here. Unfortunately, writing formulas in SO is really cumbersome. So, firstly, I think you need to make a distinction between the "epsilon" of your DGP (data generation process), also called shocks or errors (which are unobservable) and the model residuals. Shocks are unobservable because you don't know the true values of the model parameters (in this case the AR and MA parameters) - you can only estimate them. So there is a difference (in a simple AR(1) model without a constant for ease of writing using a "hat" notation for estimates) between y_t - phi_1*y_{t-1} and y_t - \hat{phi_1}*y_{t-1} - the former being the "shock" and the latter being the "residual". Typically you would also have as notation \hat{y_t} = \hat{phi_1}*y_{t-1} and so residual defined as e_t = y_t - \hat{y_t}.

Now so, given that, I think what you would like to get is y_t - e_t - \hat{phi}*e_{t-1}, \hat{phi} being the estimated MA1 parameter. The following is an example where I have explicitly fitted an ARMA(2,1) without a constant to the LakeHuron time series. I have looked at the model residuals using residuals (resid in df) and also calculated by hand (resid2 in df) - they differ a bit due to approximating initial conditions that the residuals method makes which I didn't really dig into. You will see that the values get closer and closer as t increases and the influence of the initial conditions diminishes (this is due to the fact that we have a stationary ARMA model). In the same way I have calculated the "AR_fit" as I call it once as ar_1 * y_{t-1} + ar_2 * y_{t-2} (ar_fit1 in df) and a second time as y_t - e_t - ma_1 * e_{t-1} (ar_fit2 in df). Again, you will see that the values converge as t increases (due to the same initial conditions used to calculate the residuals using the residuals function).

library(forecast)
data <-  LakeHuron - mean(LakeHuron)

nobs <- length(data)

ar_model <- arima(data, order = c(2, 0, 1), include.mean = FALSE)
fitted <-  fitted(ar_model)
resid <- residuals(ar_model)

ar1 <- ar_model$coef[1]
ar2 <- ar_model$coef[2]
ma1 <- ar_model$coef[3]

df <- data.frame(data, fitted, resid)

resid2 = data[3:nobs] - ar1*data[2:(nobs-1)] - ar2*data[1:(nobs-2)] - ma1*resid[2:(nobs-1)]
resid2 <- c(NA, NA, resid2)
df$resid2 <- resid2

ar_fit1 <- ar1*data[2:(nobs-1)] + ar2*data[1:(nobs-2)]
ar_fit1<- c(NA, NA, ar_fit1)
df$ar_fit1 <- ar_fit1   

ar_fit2 <- data[3:nobs] - resid[3:nobs] - ma1*resid[2:(nobs-1)]
ar_fit2<- c(NA, NA, ar_fit2)
df$ar_fit2 <- ar_fit2