I have a (large) length-N array of k distinct functions, and a length-N array of abcissa. I want to evaluate the functions at the abcissa to return a length-N array of ordinates, and critically, I need to do it very fast.
I have tried the following loop over a call to np.where, which is too slow:
Create some fake data to illustrate the problem:
def trivial_functional(i): return lambda x : i*x
k = 250
func_table = [trivial_functional(j) for j in range(k)]
func_table = np.array(func_table) # possibly unnecessary
We have a table of 250 distinct functions. Now I create a large array with many repeated entries of those functions, and a set of points of the same length at which these functions should be evaluated.
Npts = 1e6
abcissa_array = np.random.random(Npts)
function_indices = np.random.random_integers(0,len(func_table)-1,Npts)
func_array = func_table[function_indices]
Finally, loop over every function used by the data and evaluate it on the set of relevant points:
desired_output = np.zeros(Npts)
for func_index in set(function_indices):
idx = np.where(function_indices==func_index)[0]
desired_output[idx] = func_table[func_index](abcissa_array[idx])
This loop takes ~0.35 seconds on my laptop, the biggest bottleneck in my code by an order of magnitude.
Does anyone see how to avoid the blind lookup call to np.where? Is there a clever use of numba that can speed this loop up?
This does almost the same thing as your (excellent!) self-answer, but with a bit less rigamarole. It seems marginally faster on my machine as well -- about 30ms based on a cursory test.
def apply_indexed_fast(array, func_indices, func_table):
func_argsort = func_indices.argsort()
func_ranges = list(np.searchsorted(func_indices[func_argsort], range(len(func_table))))
func_ranges.append(None)
out = np.zeros_like(array)
for f, start, end in zip(func_table, func_ranges, func_ranges[1:]):
ix = func_argsort[start:end]
out[ix] = f(array[ix])
return out
Like yours, this splits a sequence of argsort
indices into chunks, each of which corresponds to a function in func_table
. It then uses each chunk to select input and output indices for its corresponding function. To determine the chunk boundaries, it uses np.searchsorted
instead of np.unique
-- where searchsorted(a, b)
could be thought of as a binary search algorithm that returns the index of the first value in a
equal to or greater than the given value or values in b
.
Then the zip function simply iterates over its arguments in parallel, returning a single item from each one, collected together in a tuple, and stringing those together into a list. (So zip([1, 2, 3], ['a', 'b', 'c'], ['b', 'c', 'd'])
returns [(1, 'a', 'b'), (2, 'b', 'c'), (3, 'c', 'd')]
.) This, along with the for
statement's built-in ability to "unpack" those tuples, allows for a terse but expressive way to iterate over multiple sequences in parallel.
In this case, I've used it to iterate over the functions in func_tables
alongside two out-of-sync copies of func_ranges
. This ensures that the item from func_ranges
in the end
variable is always one step ahead of the item in the start
variable. By appending None
to func_ranges
, I ensure that the final chunk is handled gracefully -- zip
stops when any one of its arguments runs out of items, which cuts off the final value in the sequence. Conveniently, the None
value also serves as an open-ended slice index!
Another trick that does the same thing requires a few more lines, but has lower memory overhead, especially when used with the itertools
equivalent of zip
, izip
:
range_iter_a = iter(func_ranges) # create generators that iterate over the
range_iter_b = iter(func_ranges) # values in `func_ranges` without making copies
next(range_iter_b, None) # advance the second generator by one
for f, start, end in itertools.izip(func_table, range_iter_a, range_iter_b):
...
However, these low-overhead generator-based approaches can sometimes be a bit slower than vanilla lists. Also, note that in Python 3, zip
behaves more like izip
.