Search code examples
pythonnumpyvectorization

numpy vectorized sum over outer products?


I have two float arrays in python numpy: A of shape (60000,) and B of shape (60000,784). I attempted the following operation:

np.einsum('i,ijk->jk',A,B[:,:,np.newaxis]*B[:,np.newaxis,:])

but it produces an out of memory error, because creating an array of shape (60000,784,784) in the intermediate step would require 137GB of RAM.

I tried the following alternative way to do that:

S=np.zeros((784,784))
for a,b in izip(A,B):
    S+=a*np.outer(b,b)

but the explicit python sum is way too slow.

Is there a numpy way of doing that, using a vectorized function to have fast performance, but without generating any (60000,784,784) shape arrays in intermediate steps?


Solution

  • It seems that you can simplify the path and then it doesn't require that big of an intermediate array.

    But don't forget the optimize=True, otherwise it is quite slow.

    import numpy as np
    
    a, b = 60, 784
    
    A = np.random.rand(a)
    B = np.random.rand(a, b)
    
    R = np.einsum("i,ijk->jk", A, B[:, :, np.newaxis] * B[:, np.newaxis, :], optimize=True)
    R2 = np.einsum("i,ij,ik->jk", A, B, B, optimize=True)
    
    np.allclose(R, R2)  # True
    

    Performance is quite good on my machine:

    In [1]: a, b = 60_000, 784
       ...: A = np.random.rand(a)
       ...: B = np.random.rand(a, b)
       ...:
       ...: %timeit R = np.einsum("i,ij,ik->jk", A, B, B, optimize=True)
    641 ms ± 32.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)