Search code examples
pythonnumpycovariance

Numpy Covariance


When there is a zero mean applied to the numpy matrix, is there a difference expected between, the following 2 codes? I was learning andrew ng's ML course and he suggested to use X @ X^T to find the covariance matrix(considering the zero mean is applied). When I tried to visually examine the matrix, found it gives diff result with np.cov function.. Please help..

import numpy as np

X=np.random.randint(0,9,(3,3))
print(X)

[[2 1 5]

 [7 4 8]

 [4 7 6]]

X = (X - X.mean(axis=0)) # <- Zero Mean
print(X)

[[-2.33333333 -3.         -1.33333333]

 [ 2.66666667  0.          1.66666667]

 [-0.33333333  3.         -0.33333333]]

cov1 = (X @ X.T)/m # <- Find covariance manually as suggested in the course

print(cov1)

[[ 5.40740741 -2.81481481 -2.59259259]

 [-2.81481481  3.2962963  -0.48148148]

 [-2.59259259 -0.48148148  3.07407407]]

cov2 = np.cov(X,bias=True) # <- Find covariance with np.cov

print(cov2)

[[ 0.7037037   0.59259259 -1.2962963 ]

 [ 0.59259259  1.81481481 -2.40740741]

 [-1.2962963  -2.40740741  3.7037037 ]]

Solution

  • If your observations are in rows and variables are in columns (set rowvar to False), then it must be x.T @ x:

    import numpy as np
    
    x0 = np.array([[2, 1, 5], [7, 4, 8], [4, 7, 6]])
    x = x0 - x0.mean(axis=0)
    
    cov1 = x.T @ x / 3
    cov2 = np.cov(x, rowvar=False, bias=True)
    
    assert np.allclose(cov1, cov2)
    

    x @ x.T is for the case when the observations are in columns and variables are in the rows:

    x = x0 - x0.mean(axis=1)[:,None]
    
    cov1 = x @ x.T / 3
    cov2 = np.cov(x, bias=True) # rowvar=True by default
    
    assert np.allclose(cov1, cov2)