I'm trying to understand acf and pacf. But do not understand why acf() results do not match simple cor() with lag1
I have simulated a time series
set.seed(100)
ar_sim <- arima.sim(list(order = c(1,0,0), ar = 0.4), n = 100)
ar_sim_t <- ar_sim[1:99]
ar_sim_t1 <- ar_sim[2:100]
cor(ar_sim_t, ar_sim_t1) ## 0.1438489
acf(ar_sim)[[1]][2] ## 0.1432205
Could you please explain why the first lag correlation in acf() does not exactly match the manual cor() between the series and lag1?
The correct way of estimating the autocorrelation of a discrete process with known mean and variance is the following. See, for instance, the Wikipedia.
n <- length(ar_sim)
l <- 1
mu <- mean(ar_sim)
s <- sd(ar_sim)
sum((ar_sim_t - mu)*(ar_sim_t1 - mu))/((n - l)*s^2)
#[1] 0.1432205
This value is not identical
to the one computed by the built-in stats::acf
but is very close to it.
a.stats <- acf(ar_sim)[[1]][2]
a.manual <- sum((ar_sim_t - mu)*(ar_sim_t1 - mu))/((n - l)*sd(ar_sim)^2)
all.equal(a.stats, a.manual) # TRUE
identical(a.stats, a.manual) # FALSE
a.stats - a.manual
#[1] 1.110223e-16