Search code examples
numpymemory-optimizationdot-product

Numpy: Reduce memory footprint of dot product with random data


I have a large numpy array that I am going to take a linear projection of using randomly generated values.

>>> input_array.shape
(50, 200000)
>>> random_array = np.random.normal(size=(200000, 300))
>>> output_array = np.dot(input_array, random_array)

Unfortunately, random_array takes up a lot of memory, and my machine starts swapping. It seems to me that I don't actually need all of random_array around at once; in theory, I ought to be able to generate it lazily during the dot product calculation...but I can't figure out how.

How can I reduce the memory footprint of the calculation of output_array from input_array?


Solution

  • This obviously isn't the fastest solution, but have you tried:

    m, inner = input_array.shape
    n = 300
    out = np.empty((m, n))
    for i in xrange(n):
        out[:, i] = np.dot(input_array, np.random.normal(size=inner))