Execution time of inequality with numpy array

Getting two widely different execution times in the below

import numpy as np
import time

array = np.arange(0, 750000)
param = 20000

t1 = time.time()
for _ in range(param):
  array <= 120
print(round(time.time() - t1), _)
# 9 19999

t2 = time.time()
for _ in range(param):
  array - 120 <= 0
print(round(time.time() - t2), _)
# 19 19999

Expectation was that execution times would be similar in the two approaches.

What's the rationale behind this diff? Is numpy internally casting 120 as an array in the second approach?

What other similar bottlenecks to be aware for code optimisation? Happy to read docs on that. Thanks!

Solution

NumPy can't perform array - 120 <= 0 as a single fused operation, or rewrite the expression as array <= 120. It needs to perform the operation as the two steps written:

array - 120

and

result <= 0

and each of these operations builds a new 750000-element array. One 750000-element array of subtraction results, and one 750000-element array of comparison results.

That's much slower than comparing each element to 120 and building an array of comparison results directly, as array <= 120 does.