I have a large data frame (~1 million rows). I am ultimately going to be interpolating spectra between two age groups. However I need to first find the nearest values above and below any age I need to find.
The DataFrame briefly looks like this
Age Wavelength Luminosity
1
1
1
4
4
6
6
I need to be able to input 5, and return values 4 and 6. I am struggling to find a way to do this? This is what I tried :
def findnearest(array,value):
idx = np.searchsorted(array,value, side='left')
if idx > 125893.0:
return array[idx]
else:
return array[idx]
idx1 = np.searchsorted(array,value, side='right')
if idx1 < 2e10:
return array[idx1]
else:
return array [idx1-1]
C = findnearest(m05_010['age'], 5.12e7)
print(C)
This only returns one value, and not both. Is this the right path or should I be doing something different? Is there a better way?
I think you should use bisect
, its a lot faster and is made for this purpose only.
from bisect import *
arr = np.array([1,1,1,4,4,4,4,4,4,4,6,6])
value = 5
lower = arr[bisect_left(arr, value) - 1]
above = arr[bisect_right(arr, value)]
lower, above
Output -
(4, 6)
Heres the time comparison from Ipython -
%timeit for x in range(100): arr[bisect_left(arr, value)]
Output -
10000 loops, best of 3: 92.4 µs per loop
And using searchsorted
-
%timeit for x in range(100): arr[np.searchsorted(arr,value,'left')-1]
Output -
The slowest run took 7.62 times longer than the fastest. This could
mean that an intermediate result is being cached. 10000 loops, best of 3: 142 µs per loop