This Cython function returns a random element among the elements of a numpy array that are within certain limits:
cdef int search(np.ndarray[int] pool):
cdef np.ndarray[int] limited
limited = pool[(pool >= lower_limit) & (pool <= upper_limit)]
return np.random.choice(limited)
This works just fine. However, this function is very critical to the performance of my code. Typed memoryviews are apparently very much faster than numpy arrays, but they cannot be filtered in the same way as above.
How could I write a function that does the same thing as above using typed memoryviews? Or is there another way to improve the performance of the function?
Okay, let's start with making the code more general, I'll come to the performance aspect later.
I generally don't use:
import numpy as np
cimport numpy as np
Personally I like using a different name for the cimport
ed package because it helps to keep the C side and the NumPy-Python side apart. So for this answer I'll use
import numpy as np
cimport numpy as cnp
Also I'll make lower_limit
and upper_limit
arguments of the function. Maybe these are defined statically (or globally) in your case but it makes the example more standalone. So the starting point is a slightly modified version of your code:
cpdef int search_1(cnp.ndarray[int] pool, int lower_limit, int upper_limit):
cdef cnp.ndarray[int] limited
limited = pool[(pool >= lower_limit) & (pool <= upper_limit)]
return np.random.choice(limited)
One very nice feature in Cython is fused types, so you can easily generalize this function for different types. Your approach will only work for 32bit integer arrays (at least if int
is 32bit on your computer). It is very easy to support more array types:
ctypedef fused int_or_float:
cnp.int32_t
cnp.int64_t
cnp.float32_t
cnp.float64_t
cpdef int_or_float search_2(cnp.ndarray[int_or_float] pool, int_or_float lower_limit, int_or_float upper_limit):
cdef cnp.ndarray[int_or_float] limited
limited = pool[(pool >= lower_limit) & (pool <= upper_limit)]
return np.random.choice(limited)
Of course you can add more types if you want. The advantage is that the new version works where the old version failed:
>>> search_1(np.arange(100, dtype=np.float_), 10, 20)
ValueError: Buffer dtype mismatch, expected 'int' but got 'double'
>>> search_2(np.arange(100, dtype=np.float_), 10, 20)
19.0
Now that it's more general let's take a look at what your function actually does:
Why create so many arrays? I mean you could simply count how many elements are in within the limits, take a random integer between 0 and the number of elements within the limits and then take whatever element would be at that index in the result array.
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef int_or_float search_3(cnp.ndarray[int_or_float] arr, int_or_float lower_bound, int_or_float upper_bound):
cdef int_or_float element
# Count the number of elements that are within the limits
cdef Py_ssize_t num_valid = 0
for index in range(arr.shape[0]):
element = arr[index]
if lower_bound <= element <= upper_bound:
num_valid += 1
# Take a random index
cdef Py_ssize_t random_index = np.random.randint(0, num_valid)
# Go through the array again and take the element at the random index that
# is within the bounds
cdef Py_ssize_t clamped_index = 0
for index in range(arr.shape[0]):
element = arr[index]
if lower_bound <= element <= upper_bound:
if clamped_index == random_index:
return element
clamped_index += 1
It won't be much faster but it will save a lot of memory. And because you don't have intermediate arrays you don't need memoryviews at all - but if you like you can just replace the cnp.ndarray[int_or_float] arr
in the argument list with int_or_float[:]
or even int_or_float[::1] arr
and operate on the memoryview (it probably won't be faster but it also won't be slower).
I generally prefer numba to Cython (at least if just I am using it) so let's compare it to the numba version of that code:
import numba as nb
import numpy as np
@nb.njit
def search_numba(arr, lower, upper):
num_valids = 0
for item in arr:
if item >= lower and item <= upper:
num_valids += 1
random_index = np.random.randint(0, num_valids)
valid_index = 0
for item in arr:
if item >= lower and item <= upper:
if valid_index == random_index:
return item
valid_index += 1
And also to a numexpr
variant:
import numexpr
np.random.choice(arr[numexpr.evaluate('(arr >= l) & (arr <= u)')])
Okay, let's do a benchmark:
from simple_benchmark import benchmark, MultiArgument
arguments = {2**i: MultiArgument([np.random.randint(0, 100, size=2**i, dtype=np.int_), 5, 50]) for i in range(2, 22)}
funcs = [search_1, search_2, search_3, search_numba, search_numexpr]
b = benchmark(funcs, arguments, argument_name='array size')
So by not using intermediate arrays you can be roughly 5 times faster and if you would use numba you could get another factor 5 (seems like I'm missing some possible Cython optimizations there, numba is typically ~2 times faster or as fast as Cython). So you could get it ~20 times faster with the numba solution.
numexpr
isn't really comparable here, mostly because you cannot use the boolean array indexing there.
The differences will depend on the content of the array and the limits. You also have to measure the performance of your application.
As an aside: If the lower limit and the upper limit generally don't change the fastest solution would be to filter the array once and then just call np.random.choice
on it several times. That could be orders of magnitude faster.
lower_limit = ...
upper_limit = ...
filtered_array = pool[(pool >= lower_limit) & (pool <= upper_limit)]
def search_cached():
return np.random.choice(filtered_array)
%timeit search_cached()
2.05 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
So almost 1000 times faster and doesn't need Cython or numba at all. But it's a special case that may not be useful for you.
The benchmark setup in case you want to do it yourself is here (based on a Jupyter Notebook/Lab, thus the %
-symbols):
%load_ext cython
%%cython
cimport numpy as cnp
import numpy as np
cpdef int search_1(cnp.ndarray[int] pool, int lower_limit, int upper_limit):
cdef cnp.ndarray[int] limited
limited = pool[(pool >= lower_limit) & (pool <= upper_limit)]
return np.random.choice(limited)
ctypedef fused int_or_float:
cnp.int32_t
cnp.int64_t
cnp.float32_t
cnp.float64_t
cpdef int_or_float search_2(cnp.ndarray[int_or_float] pool, int_or_float lower_limit, int_or_float upper_limit):
cdef cnp.ndarray[int_or_float] limited
limited = pool[(pool >= lower_limit) & (pool <= upper_limit)]
return np.random.choice(limited)
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef int_or_float search_3(cnp.ndarray[int_or_float] arr, int_or_float lower_bound, int_or_float upper_bound):
cdef int_or_float element
cdef Py_ssize_t num_valid = 0
for index in range(arr.shape[0]):
element = arr[index]
if lower_bound <= element <= upper_bound:
num_valid += 1
cdef Py_ssize_t random_index = np.random.randint(0, num_valid)
cdef Py_ssize_t clamped_index = 0
for index in range(arr.shape[0]):
element = arr[index]
if lower_bound <= element <= upper_bound:
if clamped_index == random_index:
return element
clamped_index += 1
import numexpr
import numba as nb
import numpy as np
def search_numexpr(arr, l, u):
return np.random.choice(arr[numexpr.evaluate('(arr >= l) & (arr <= u)')])
@nb.njit
def search_numba(arr, lower, upper):
num_valids = 0
for item in arr:
if item >= lower and item <= upper:
num_valids += 1
random_index = np.random.randint(0, num_valids)
valid_index = 0
for item in arr:
if item >= lower and item <= upper:
if valid_index == random_index:
return item
valid_index += 1
from simple_benchmark import benchmark, MultiArgument
arguments = {2**i: MultiArgument([np.random.randint(0, 100, size=2**i, dtype=np.int_), 5, 50]) for i in range(2, 22)}
funcs = [search_1, search_2, search_3, search_numba, search_numexpr]
b = benchmark(funcs, arguments, argument_name='array size')
%matplotlib widget
import matplotlib.pyplot as plt
plt.style.use('ggplot')
b.plot()