Search code examples
pythonnumpycomparisonvectorization

Is it possible to vectorize this numpy array comparison?


I have these two numpy arrays in Python:

a = np.array(sorted(np.random.rand(6)*6)) # It is sorted.
b = np.array(np.random.rand(3)*6)

Say that the arrays are

a = array([0.27148588, 0.42828064, 2.48130785, 4.01811243, 4.79403723, 5.46398145])
b = array([0.06231266, 1.64276013, 5.22786201])

I want to produce an array containing the indices where a is <= than each element in b, i.e. I want exactly this:

np.argmin(np.array([a<b_i for b_i in b]),1)-1

which produces array([-1, 1, 4]) meaning that b[0]<a[0], a[1]<b[1]<a[2] and a[4]<b[2]<a[5].

Is there any native numpy fast vectorized way of doing this avoiding the for loop?


Solution

  • To answer your specific question, i.e., a vectorized way to get the equivalent of np.array([a<b_i for b_i in b], you can take advantage of broadcasting, here, you could use:

    a[None, ...] < b[..., None]
    

    So:

    >>> a[None, ...] < b[..., None]
    array([[False, False, False, False, False, False],
           [ True,  True, False, False, False, False],
           [ True,  True,  True,  True,  True, False]])
    

    Importantly, for broadcasting:

    >>> a[None, ...].shape,  b[..., None].shape
    ((1, 6), (3, 1))
    

    Here's the link to the official numpy docs to understand broadcasting. Some relevant tidbits:

    When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when

    1. they are equal, or

    2. one of them is 1

    ...

    When either of the dimensions compared is one, the other is used. In other words, dimensions with size 1 are stretched or “copied” to match the other.

    Edit

    As noted in the comments under your question, using an entirely different approach is much better algorithmically than your own, brute force solution, namely, taking advantage of binary search, using np.searchsorted