Vectorizing (squared) mahalanobis distance in numpy

I have X (n x d), Y (m x d), and positive-definite L (d x d). I want to calculate D where D_ij is (X_i - Y_i) * L * (X_i - Y_i).T. n and m are around 250; d is around 10^4.

I can use scipy.spatial.distance.cdist, but this is very slow.

scipy.spatial.distance.cdist(X, Y, metric='mahalanobis', VI=L)

Looking at Dougal's answer to this question, I tried

    diff = X[np.newaxis, :, :] - Y[:, np.newaxis, :]
    D = np.einsum('jik,kl,jil->ij', diff, L, diff)

Which is also very slow.

Is there a more efficient way to vectorize this computation?

Solution

Using a combination of np.tensordot and np.einsum helps in situations like these -

np.einsum('jil,jil->ij',np.tensordot(diff, L, axes=(2,0)), diff)

Runtime test -

In [26]: n,m,d = 30,40,50
    ...: X = np.random.rand(n,d)
    ...: L = np.random.rand(d,d)
    ...: Y = np.random.rand(m,d)
    ...: 

In [27]: diff = X[np.newaxis, :, :] - Y[:, np.newaxis, :]

In [28]: %timeit np.einsum('jik,kl,jil->ij', diff, L, diff)
100 loops, best of 3: 7.81 ms per loop

In [29]: %timeit np.einsum('jil,jil->ij',np.tensordot(diff, L, axes=(2,0)), diff)
1000 loops, best of 3: 472 µs per loop