Search code examples
rstatisticscovariance

Implementing covariance by hand


I made a curious observation today - I implemented the covariance of two vectors 3 different ways, and got 2 different answers. In R, methods 1 and 2 are the same. Method 3 ought to be mathematically the same, but somehow it returns a different number. What is the bug here?

# Make some data
set.seed(100)
n = 100
foo = rnorm(n)
bar = rnorm(n)

# method 1
cov(foo, bar)

# method 2
sum(
  (foo - mean(foo)) * (bar - mean(bar))
) / (n-1)

# method 3
(
  sum(foo*bar) - 
    mean(bar)*sum(foo) -
    mean(foo)*sum(bar) + 
    mean(foo)*mean(bar)
) / (n-1)

Solution

  • You forgot that the mathematical expression sum_{i=1}^n mean(foo)*mean(bar) comes out to n*mean(foo)*mean(bar), not mean(foo)*mean(bar) ...

    (sum(foo*bar) - 
        mean(bar)*sum(foo) -
        mean(foo)*sum(bar) + n*mean(foo)*mean(bar))/(n-1)