Search code examples
pythonnumpyperformanceany

Performance of numpy all/any vs testing a single element


I create an array that does not contain a single zero (let's ignore that it does, with zero probability, as np.random.rand() samples [0,1) uniformly). I want to check whether all values are equal to zero (for some other purpose the arrays may contain all zeros). Below are some timings.

Surprisingly to me, checking a single (nonzero) element is about 2000 times faster than using np.all() or np.any(). I would assume that the compiler internally replaces np.all() by np.any() of the inverse condition and that np.any()/np.all() returns True/False at the first instance that the condition is fulfilled/violated (i.e. the compiler does not create the entire array of True or False values first).

How comes np.all() or np.any() are that much slower when it would only have to check one element? Or is this because of the external knowledge I put that the array does not contain all zeros? In the case of an all-zeros array, I guess it might be too slow to do the boolean comparison separately for each element. I don't know about the performance of the underlying low-level algorithms, but each element needs to be accessed once independent of whether it goes one by one or creates the whole boolean array once.

import numpy as np

np.random.seed(100)
a = np.random.rand(10418,144)
%timeit a[0,0] == 0
%timeit (a == 0).all()
%timeit np.all(a == 0)
%timeit (a != 0).any()
%timeit np.any(a != 0)

# 400 ns ± 2.08 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# 713 µs ± 382 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 720 µs ± 1.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 711 µs ± 407 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 723 µs ± 630 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Solution

  • When you write a == 0, numpy creates a new array of type boolean, compares each element in a with 0 and stores the result in the array. This allocation, initialization, and subsequent deallocation is the reason for the high cost.

    Note that you don't need the explicit a == 0 in the first place. Integers that are zero always evauate to False, nonzero integers to True. np.all(a) is equivalent to np.all(a != 0). So np.all(a==0) is equivalent to not np.any(a)