This question is linked to this: First Python list index greater than x?
I have a (sorted) list of floats, and I want to find the first index that exceeds each value of a second list
e.g.
l=[0.2,0.3,0.7,0.9]
m=[0.25,0.6]
if m were a float I would use this:
bisect.bisect_left(l,m)
But for the case where m is a list this fails, and I can only think to employ a list comprehension:
[bisect.bisect_left(l,i) for i in m]
which gives:
[1, 2]
which works, but I want to speed it up for large lists in my real example by avoiding the list comprehension as my tests showed this was the "bottleneck" operation (I earlier stated I suspected it was too slow). Is there a way to efficiently do this using a vectorized function in e.g. numpy or an improved algorithm (as only one traverse of the list is required)?
So I found there is a numpy function to perform this task, np.searchsorted. which is much faster than the use of list comprehensions.
result=np.searchsorted(searchlist,newindices)
These are the timings for the various solutions:
1. Standard List comprehension:
this was my first attempt at a solution
python3 -m timeit -s "import numpy as np" -s "import bisect" -s "h=np.sort(np.random.uniform(size=10000))" -s "n=np.sort(np.random.uniform(size=1000))" "r=[bisect.bisect_left(h,i) for i in n]"
200 loops, best of 5: 1.61 msec per loop
2. Shortened search in for loop
This was the solution kindly provided by @lenik
python3 -m timeit -s "import numpy as np" -s "import bisect" -s "h=np.sort(np.random.uniform(size=10000))" -s "n=np.sort(np.random.uniform(size=1000))" "r=[bisect.bisect_left(h,n[0])]" "for i in n[1:]:" " r.append(bisect.bisect_left(h,i,r[-1]))"
200 loops, best of 5: 1.6 msec per loop
Hardly different from the list comprehension which I was somewhat surprised about...
3. Numpy searchsorted
python3 -m timeit -s "import numpy as np" -s "import bisect" -s "h=np.sort(np.random.uniform(size=10000))" -s "n=np.sort(np.random.uniform(size=1000))" "r=np.searchsorted(h,n)"
10000 loops, best of 5: 33.6 usec per loop
Approximately 50 times faster than the list comprehension based solutions for this example, so hands down the fastest.