Search code examples
rautocorrelation

Why manual autocorrelation does not match acf() results?


I'm trying to understand acf and pacf. But do not understand why acf() results do not match simple cor() with lag1

I have simulated a time series

set.seed(100)
ar_sim <- arima.sim(list(order = c(1,0,0), ar = 0.4), n = 100)

ar_sim_t <- ar_sim[1:99]
ar_sim_t1 <- ar_sim[2:100]

cor(ar_sim_t, ar_sim_t1)   ## 0.1438489
acf(ar_sim)[[1]][2]        ## 0.1432205

Could you please explain why the first lag correlation in acf() does not exactly match the manual cor() between the series and lag1?


Solution

  • The correct way of estimating the autocorrelation of a discrete process with known mean and variance is the following. See, for instance, the Wikipedia.

    n <- length(ar_sim)
    l <- 1
    mu <- mean(ar_sim)
    s <- sd(ar_sim)
    
    sum((ar_sim_t - mu)*(ar_sim_t1 - mu))/((n - l)*s^2)
    #[1] 0.1432205
    

    This value is not identical to the one computed by the built-in stats::acf but is very close to it.

    a.stats <- acf(ar_sim)[[1]][2]
    a.manual <- sum((ar_sim_t - mu)*(ar_sim_t1 - mu))/((n - l)*sd(ar_sim)^2)
    
    all.equal(a.stats, a.manual)  # TRUE
    identical(a.stats, a.manual)  # FALSE
    
    a.stats - a.manual
    #[1] 1.110223e-16