Search code examples
pythonnumpyvariable-assignmentpre-allocation

Why is in-place assignment slower than creating a new array in NumPy?


I am trying to optimize a code, which has allocations inside a function that is repeatedly called in a loop. I ran some performance tests using jupyter and results were counterintuitive for me. As a minimal example, see the following.

Given arrays A, B, I will perform matrix multiplication of these two in a loop.

  • Approach 1 will have no preallocation, and the result will be stored in C,
  • Approach 2 will have a preallocated array D, where the result of the multiplication is stored
import numpy as np

A = np.random.rand(10, 10)
B = np.random.rand(10, 10000)
D = np.random.rand(10, 10000)

# Approach 1, no pre-allocation
for i in range(20000):
    C = A @ B

# Approach 2, pre-allocated D
for i in range(20000):
    D[:] = A @ B

I expected the second approach to be faster since it reuses the memory in D instead of allocating a new array each time. However, timing the loops shows that the first approach is actually 2x faster.

Why is the in-place assignment (D[:] = A @ B) slower than creating a new array (C = A @ B)? Is this related to memory management of numpy?


Solution

  • You're not reusing D's memory. Both of your approaches allocate a new array every time. Your second approach then copies the contents of this new array into D, taking extra time to do so.

    If you want to directly write the results into D's memory, that'd be

    np.matmul(A, B, out=D)