Difference between arima(1,0,0) function and running a regression on lag values?

I'm currently doing time series in R and had a few fundamental R doubts. Mainly, what is the difference between the two pieces of code?

ar_1 <- lm(df$VALUE ~ lag(df$value))
summary(ar_1)

arima_values <- arima(df$value, order=c(1,0,0))
arima_values

I have to essentially get the coefficients, S.E. etc. but the above two pieces of code return different values for each. What is each piece of code doing? The general formula for AR(1) is essentially running a regression on the 1st order lagged values correct? The ARIMA function should achieve the same thing?

Solution

They give the same values to several decimals if the arguments to arima are set as shown:

# generate test series
set.seed(13)
n <- 25
mu <- 0.4
phi <- 0.8
s <- seq(0, length = n - 1)
x <- rnorm(1)
for(i in 2:n) x[i] <- mu + phi * x[i-1] + rnorm(1)

# lm
mod.lm <- lm(x[-1] ~ x[-n])
coef(mod.lm)
## (Intercept)       x[-n] 
##   0.7593169   0.7408584 

# arima - use conditional sum of squares and drop 0 observations    
mod.arima <- arima(x, c(1, 0, 0), method = "CSS", n.cond = 0)
co <- coef(mod.arima)
co
##       ar1 intercept 
## 0.7408535 2.9300719 

# arima defines intercept differently so use this to compare to lm intercept
with(as.list(co), intercept * (1 - ar1))  
## [1] 0.7593179

We can also use ar with the appropriate arguments:

mod.ar <- ar(x, order.max = 1, method = "ols", demean = FALSE, intercept = TRUE)
mod.ar
## 
## Call:
## ar(x = x, order.max = 1, method = "ols", demean = FALSE, intercept = TRUE)
##
## Coefficients:
##      1  
## 0.7409  
##
## Intercept: 0.7593 (0.3695)