Search code examples
pythonnumpycovariance-matrix

np.cov giving unexpected number of values


When using np.cov command on a random dataset of 10 values, I'm getting a 10x10 array as the answer. I think my data is not formatted correctly, but I'm not sure.

np.random.seed(1)
rho = 0.2
sigma = 1
cov = (sigma**2)*[[1,rho],[rho,1]]
mean1 = (0,0)
x1 = np.random.multivariate_normal(mean1, cov, (10))
mean1 = np.mean(x1)
cov1 = np.cov(x1)
print(cov1)

Solution

  • This is the correct behavior—np.cov returns a covariance matrix.

    In particular, it takes each row of the input as a variable, with the columns representing different values of those variables. To reverse this behavior, pass rowvar=False.

    In particular, if you have two variables represented as two columns of a matrix, you can use np.cov(data, rowvar=False) (or np.cov(data.T)) to get a 2 by 2 covariance matrix, in which the elements at cov[0,1] and cov[1,0] will be the covariance between the two variables.

    This is also discussed here.