Search code examples
pythonnumpymatrixcovariance

Numpy - Covariance between row of two matrix


I need to compute the covariance between each row of two different matrices, i.e. the covariance between the first row of the first matrix with the first row of the second matrix, and so on till the last row of both matrices. I can do it without NumPy with the code attached below, my question is: is it possible to avoid the use of the "for loop" and get the same result with NumPy?

m1 = np.array([[1,2,3],[2,2,2]])
m2 = np.array([[2.56, 2.89, 3.76],[1,2,3.95]])

output = []
for a,b in zip(m1,m2):
    cov = np.cov(a, b)
    output.append(cov[0][1])
print(output)

Thanks in advance!


Solution

  • If you are handling big arrays, I would consider this:

    from numba import jit
    import numpy as np
    
    
    m1 = np.random.rand(10000, 3)
    m2 = np.random.rand(10000, 3)
    
    @jit(nopython=True) 
    def nb_cov(a, b): 
        return [np.cov(x)[0,1] for x in np.stack((a, b), axis=1)]
    

    To get a runtime of

    >>> %timeit nb_cov(m1, m2)
    The slowest run took 94.24 times longer than the fastest. This could mean that an intermediate result is being cached.
    1 loop, best of 5: 10.5 ms per loop
    

    Compared with

    >>> %timeit [np.cov(x)[0,1] for x in np.stack((m1, m2), axis=1)]
    1 loop, best of 5: 410 ms per loop