Search code examples
pythonnumpycovariance

numpy.cov() returns unexpected output


I have a X dataset which has 9 features and 683 rows (683x9). I want to take covariance matrix of this X dataset and another dataset which has same shape with X. I use np.cov(originalData, generatedData, rowvar=False) code to get it but it returns a covariance matrix of shape 18x18. I expected to get 9x9 covariance matrix. Can you please help me to fix it.


Solution

  • The method cov calculates the covariances for all pairs of variables that you give it. You have 9 variables in one array, and 9 more in the other. That's 18 in total. So you get 18 by 18 matrix. (Under the hood, cov concatenates the two arrays you gave it before calculating the covariance).

    If you are only interested in the covariance of the variables from the 1st array with the variables from the 2nd, pick the first half of rows and second half of columns:

    C = np.cov(originalData, generatedData, rowvar=False)[:9, 9:]
    

    Or in general, with two not necessarily equal matrices X and Y,

    C = np.cov(X, Y, rowvar=False)[:X.shape[1], Y.shape[1]:]