Search code examples
pythonnumpylist-comprehension

Efficient solution to find list indices greater than elements in a second list


This question is linked to this: First Python list index greater than x?

I have a (sorted) list of floats, and I want to find the first index that exceeds each value of a second list

e.g.

 l=[0.2,0.3,0.7,0.9]
 m=[0.25,0.6]

if m were a float I would use this:

 bisect.bisect_left(l,m)

But for the case where m is a list this fails, and I can only think to employ a list comprehension:

[bisect.bisect_left(l,i) for i in m]

which gives:

 [1, 2]

which works, but I want to speed it up for large lists in my real example by avoiding the list comprehension as my tests showed this was the "bottleneck" operation (I earlier stated I suspected it was too slow). Is there a way to efficiently do this using a vectorized function in e.g. numpy or an improved algorithm (as only one traverse of the list is required)?


Solution

  • So I found there is a numpy function to perform this task, np.searchsorted. which is much faster than the use of list comprehensions.

    result=np.searchsorted(searchlist,newindices)
    

    These are the timings for the various solutions:

    1. Standard List comprehension:

    this was my first attempt at a solution

    python3 -m timeit -s "import numpy as np" -s "import bisect" -s "h=np.sort(np.random.uniform(size=10000))" -s "n=np.sort(np.random.uniform(size=1000))" "r=[bisect.bisect_left(h,i) for i in n]"
    

    200 loops, best of 5: 1.61 msec per loop

    2. Shortened search in for loop

    This was the solution kindly provided by @lenik

    python3 -m timeit -s "import numpy as np" -s "import bisect" -s "h=np.sort(np.random.uniform(size=10000))" -s "n=np.sort(np.random.uniform(size=1000))" "r=[bisect.bisect_left(h,n[0])]" "for i in n[1:]:" "    r.append(bisect.bisect_left(h,i,r[-1]))"
    

    200 loops, best of 5: 1.6 msec per loop

    Hardly different from the list comprehension which I was somewhat surprised about...

    3. Numpy searchsorted

    python3 -m timeit -s "import numpy as np" -s "import bisect" -s "h=np.sort(np.random.uniform(size=10000))" -s "n=np.sort(np.random.uniform(size=1000))" "r=np.searchsorted(h,n)"
    

    10000 loops, best of 5: 33.6 usec per loop

    Approximately 50 times faster than the list comprehension based solutions for this example, so hands down the fastest.