When using np.cov
command on a random dataset of 10 values, I'm getting a 10x10
array as the answer. I think my data is not formatted correctly, but I'm not sure.
np.random.seed(1)
rho = 0.2
sigma = 1
cov = (sigma**2)*[[1,rho],[rho,1]]
mean1 = (0,0)
x1 = np.random.multivariate_normal(mean1, cov, (10))
mean1 = np.mean(x1)
cov1 = np.cov(x1)
print(cov1)
This is the correct behavior—np.cov
returns a covariance matrix.
In particular, it takes each row of the input as a variable, with the columns representing different values of those variables. To reverse this behavior, pass rowvar=False
.
In particular, if you have two variables represented as two columns of a matrix, you can use np.cov(data, rowvar=False)
(or np.cov(data.T)
) to get a 2 by 2 covariance matrix, in which the elements at cov[0,1]
and cov[1,0]
will be the covariance between the two variables.
This is also discussed here.