I made a curious observation today - I implemented the covariance of two vectors 3 different ways, and got 2 different answers. In R, methods 1 and 2 are the same. Method 3 ought to be mathematically the same, but somehow it returns a different number. What is the bug here?
# Make some data
set.seed(100)
n = 100
foo = rnorm(n)
bar = rnorm(n)
# method 1
cov(foo, bar)
# method 2
sum(
(foo - mean(foo)) * (bar - mean(bar))
) / (n-1)
# method 3
(
sum(foo*bar) -
mean(bar)*sum(foo) -
mean(foo)*sum(bar) +
mean(foo)*mean(bar)
) / (n-1)
You forgot that the mathematical expression sum_{i=1}^n mean(foo)*mean(bar)
comes out to n*mean(foo)*mean(bar)
, not mean(foo)*mean(bar)
...
(sum(foo*bar) -
mean(bar)*sum(foo) -
mean(foo)*sum(bar) + n*mean(foo)*mean(bar))/(n-1)