I have X (n x d), Y (m x d), and positive-definite L (d x d). I want to calculate D where D_ij is (X_i - Y_i) * L * (X_i - Y_i).T. n and m are around 250; d is around 10^4.
I can use scipy.spatial.distance.cdist
, but this is very slow.
scipy.spatial.distance.cdist(X, Y, metric='mahalanobis', VI=L)
Looking at Dougal's answer to this question, I tried
diff = X[np.newaxis, :, :] - Y[:, np.newaxis, :]
D = np.einsum('jik,kl,jil->ij', diff, L, diff)
Which is also very slow.
Is there a more efficient way to vectorize this computation?
Using a combination of np.tensordot
and np.einsum
helps in situations like these -
np.einsum('jil,jil->ij',np.tensordot(diff, L, axes=(2,0)), diff)
Runtime test -
In [26]: n,m,d = 30,40,50
...: X = np.random.rand(n,d)
...: L = np.random.rand(d,d)
...: Y = np.random.rand(m,d)
...:
In [27]: diff = X[np.newaxis, :, :] - Y[:, np.newaxis, :]
In [28]: %timeit np.einsum('jik,kl,jil->ij', diff, L, diff)
100 loops, best of 3: 7.81 ms per loop
In [29]: %timeit np.einsum('jil,jil->ij',np.tensordot(diff, L, axes=(2,0)), diff)
1000 loops, best of 3: 472 µs per loop