I have these two numpy arrays in Python:
a = np.array(sorted(np.random.rand(6)*6)) # It is sorted.
b = np.array(np.random.rand(3)*6)
Say that the arrays are
a = array([0.27148588, 0.42828064, 2.48130785, 4.01811243, 4.79403723, 5.46398145])
b = array([0.06231266, 1.64276013, 5.22786201])
I want to produce an array containing the indices where a
is <=
than each element in b
, i.e. I want exactly this:
np.argmin(np.array([a<b_i for b_i in b]),1)-1
which produces array([-1, 1, 4])
meaning that b[0]<a[0]
, a[1]<b[1]<a[2]
and a[4]<b[2]<a[5]
.
Is there any native numpy fast vectorized way of doing this avoiding the for
loop?
To answer your specific question, i.e., a vectorized way to get the equivalent of np.array([a<b_i for b_i in b]
, you can take advantage of broadcasting, here, you could use:
a[None, ...] < b[..., None]
So:
>>> a[None, ...] < b[..., None]
array([[False, False, False, False, False, False],
[ True, True, False, False, False, False],
[ True, True, True, True, True, False]])
Importantly, for broadcasting:
>>> a[None, ...].shape, b[..., None].shape
((1, 6), (3, 1))
Here's the link to the official numpy docs to understand broadcasting. Some relevant tidbits:
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when
they are equal, or
one of them is 1
...
When either of the dimensions compared is one, the other is used. In other words, dimensions with size 1 are stretched or “copied” to match the other.
As noted in the comments under your question, using an entirely different approach is much better algorithmically than your own, brute force solution, namely, taking advantage of binary search, using np.searchsorted