Python (NumPy): Memory efficient array multiplication with fancy indexing

I'm looking to do fast matrix multiplication in python, preferably NumPy, of an array A with another array B of repeated matrices by using a third array I of indices. This can be accomplished using fancy indexing and matrix multiplication:

from numpy.random import rand, randint

A = rand(1000,5,5)
B = rand(40000000,5,1)
I = randint(low=0, high=1000, size=40000000)

A[I] @ B

However, this creates the intermediate array A[I] of shape (40000000, 5, 5) which overflows the memory. It seems highly inefficient to have to repeat a small set of matrices for multiplication, and this is essentially a more general version of broadcasting such as A[0:1] @ B which has no issues.

Are there any alternatives?

I have looked at NumPy's einsum function but have not seen any support for utilizing an index vector in the call.

Solution

If you're open to another package, you could wrap it up with dask.

from numpy.random import rand, randint
from dask import array as da

A = da.from_array(rand(1000,5,5))
B = da.from_array(rand(40000000,5,1))
I = da.from_array(randint(low=0, high=1000, size=40000000))

fancy = A[I] @ B

After finished manipulating, then bring it into memory using fancy.compute()