Search code examples
pythoncovariancecovariance-matrix

Covariance of Matrix in python


I want to find the covariance of a 10304*280 matrix (i.e 280 variable and each have 10304 subjects) and I am using the following numpy function to find this.

cov = numpy.cov(matrix)

I am expected 208*280 matrix as a result but it returned 10304*10304 matrix.


Solution

  • As suggested in the previous answer, you can change your memory layout. An easy way to do this in 2d is simply transposing the matrix:

    import numpy as np
    r = np.random.rand(100, 10)
    np.cov(r).shape # is (100,100)
    np.cov(r.T).shape # is (10,10)
    

    But you can also specify a rowvar flag. Read about it here:

    import numpy as np
    r = np.random.rand(100, 10)
    np.cov(r).shape # is (100,100)
    np.cov(r, rowvar=False).shape # is (10,10)
    

    I think especially for large matrices this might be more performant, since you avoid the swapping/transposing of axes.

    UPDATE:

    I thought about this and wondered if the algorithm is actually different depending on rowvar == True or rowvar == False. Well, as it turns out, if you change the rowvar flag, numpy simply transposes the array itself :P.

    Look here.

    So, in terms of performance, nothing will change between the two versions.