Search code examples
pythonarraysperformancenumpycovariance

Fast numpy covariance for arrays of different shape


I need to obtain the covariance between an array a of shape (M1, M2, N) and another array b of shape (N,).

What I currently do is use a for block:

import numpy as np

M1, M2, N = 23, 50, 117
a = np.random.uniform(-2., 2., (M1, M2, N))
b = np.random.uniform(-1., 1., N)

c = []
for i in range(M1):
    for j in range(M2):
        c.append(np.cov(a[i][j], b)[0, 1])

but it gets a bit slow for large (M1, M2). Is there a way to speed this up?


Solution

  • You can always calculate the cov by hand. Here are two suggestions using dot and einsum respectively.

    import numpy as np
    
    M1, M2, N = 23, 50, 117
    a = np.random.uniform(-2., 2., (M1, M2, N))
    b = np.random.uniform(-1., 1., N)
    
    c = []
    for i in range(M1):
        for j in range(M2):
            c.append(np.cov(a[i][j], b)[0, 1])
    
    c1 = np.reshape(c, (M1, M2))
    
    ac = a - a.mean(axis=-1, keepdims=True)
    bc = (b - b.mean()) / (N-1)
    
    c2 = np.dot(ac, bc)
    c3 = np.einsum('ijk,k->ij', ac, bc)
    print(np.allclose(c1, c2))
    print(np.allclose(c1, c3))
    

    Prints

    True
    True