Search code examples
rtime-seriesarima

Difference between arima(1,0,0) function and running a regression on lag values?


I'm currently doing time series in R and had a few fundamental R doubts. Mainly, what is the difference between the two pieces of code?

ar_1 <- lm(df$VALUE ~ lag(df$value))
summary(ar_1)
arima_values <- arima(df$value, order=c(1,0,0))
arima_values

I have to essentially get the coefficients, S.E. etc. but the above two pieces of code return different values for each. What is each piece of code doing? The general formula for AR(1) is essentially running a regression on the 1st order lagged values correct? The ARIMA function should achieve the same thing?


Solution

  • They give the same values to several decimals if the arguments to arima are set as shown:

    # generate test series
    set.seed(13)
    n <- 25
    mu <- 0.4
    phi <- 0.8
    s <- seq(0, length = n - 1)
    x <- rnorm(1)
    for(i in 2:n) x[i] <- mu + phi * x[i-1] + rnorm(1)
    
    # lm
    mod.lm <- lm(x[-1] ~ x[-n])
    coef(mod.lm)
    ## (Intercept)       x[-n] 
    ##   0.7593169   0.7408584 
    
    # arima - use conditional sum of squares and drop 0 observations    
    mod.arima <- arima(x, c(1, 0, 0), method = "CSS", n.cond = 0)
    co <- coef(mod.arima)
    co
    ##       ar1 intercept 
    ## 0.7408535 2.9300719 
    
    # arima defines intercept differently so use this to compare to lm intercept
    with(as.list(co), intercept * (1 - ar1))  
    ## [1] 0.7593179
    

    We can also use ar with the appropriate arguments:

    mod.ar <- ar(x, order.max = 1, method = "ols", demean = FALSE, intercept = TRUE)
    mod.ar
    ## 
    ## Call:
    ## ar(x = x, order.max = 1, method = "ols", demean = FALSE, intercept = TRUE)
    ##
    ## Coefficients:
    ##      1  
    ## 0.7409  
    ##
    ## Intercept: 0.7593 (0.3695)