I have two float arrays in python numpy: A
of shape (60000,) and B
of shape (60000,784). I attempted the following operation:
np.einsum('i,ijk->jk',A,B[:,:,np.newaxis]*B[:,np.newaxis,:])
but it produces an out of memory error, because creating an array of shape (60000,784,784) in the intermediate step would require 137GB of RAM.
I tried the following alternative way to do that:
S=np.zeros((784,784))
for a,b in izip(A,B):
S+=a*np.outer(b,b)
but the explicit python sum is way too slow.
Is there a numpy way of doing that, using a vectorized function to have fast performance, but without generating any (60000,784,784) shape arrays in intermediate steps?
It seems that you can simplify the path and then it doesn't require that big of an intermediate array.
But don't forget the optimize=True
, otherwise it is quite slow.
import numpy as np
a, b = 60, 784
A = np.random.rand(a)
B = np.random.rand(a, b)
R = np.einsum("i,ijk->jk", A, B[:, :, np.newaxis] * B[:, np.newaxis, :], optimize=True)
R2 = np.einsum("i,ij,ik->jk", A, B, B, optimize=True)
np.allclose(R, R2) # True
Performance is quite good on my machine:
In [1]: a, b = 60_000, 784
...: A = np.random.rand(a)
...: B = np.random.rand(a, b)
...:
...: %timeit R = np.einsum("i,ij,ik->jk", A, B, B, optimize=True)
641 ms ± 32.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)