Search code examples
pythonnumpybroadcasting

Product and sum without addtional memory allocation


Is there away to multiply two arrays and sum along an axis (or multiple axes) without allocating extra memory?

In this example:

import numpy as np

A = np.random.random((10, 10, 10))
B = np.random.random((10, 10, 10))

C = np.sum(A[:, None, :, :, None] * B[None, :, None, :, :], axis=(-1,-2))

When computing C, an intermediate matrix of size 10x10x10x10x10 is created only to be collapsed immediately. Is there a way to avoid this in numpy?


Solution

  • This looks like a dot product with the transpose of the second array:

    C = A @ B.T
    

    NB. The operation in the original question was: C = np.sum(A[:, None, :] * B[None, :, :], axis=-1).

    Quick check:

    C1 = np.sum(A[:, None, :] * B[None, :, :], axis=-1)
    C2 = A @ B.T
    
    assert np.allclose(C1, C2)
    

    You can generalize with einsum:

    np.einsum('ikl,jlm->ijk', A, B)
    

    Quick check:

    A = np.random.random((2, 3, 4))
    B = np.random.random((2, 4, 5))
    
    #             i    j   k  l    m         i   j    k   l  m
    C1 = np.sum(A[:, None, :, :, None] * B[None, :, None, :, :], axis=(-1,-2))
    C2 = np.einsum('ikl,jlm->ijk', A, B)
    
    assert np.allclose(C1, C2)