python arrays performance numpy covariance

Fast numpy covariance for arrays of different shape

I need to obtain the covariance between an array a of shape (M1, M2, N) and another array b of shape (N,).

What I currently do is use a for block:

import numpy as np

M1, M2, N = 23, 50, 117
a = np.random.uniform(-2., 2., (M1, M2, N))
b = np.random.uniform(-1., 1., N)

c = []
for i in range(M1):
    for j in range(M2):
        c.append(np.cov(a[i][j], b)[0, 1])

but it gets a bit slow for large (M1, M2). Is there a way to speed this up?

Solution

You can always calculate the cov by hand. Here are two suggestions using dot and einsum respectively.

import numpy as np

M1, M2, N = 23, 50, 117
a = np.random.uniform(-2., 2., (M1, M2, N))
b = np.random.uniform(-1., 1., N)

c = []
for i in range(M1):
    for j in range(M2):
        c.append(np.cov(a[i][j], b)[0, 1])

c1 = np.reshape(c, (M1, M2))

ac = a - a.mean(axis=-1, keepdims=True)
bc = (b - b.mean()) / (N-1)

c2 = np.dot(ac, bc)
c3 = np.einsum('ijk,k->ij', ac, bc)
print(np.allclose(c1, c2))
print(np.allclose(c1, c3))

Prints

True
True