Search code examples
pythonsortingpandasdataframeastronomy

Python/Pandas find closest value above/below in one Column


I have a large data frame (~1 million rows). I am ultimately going to be interpolating spectra between two age groups. However I need to first find the nearest values above and below any age I need to find.

The DataFrame briefly looks like this

 Age   Wavelength   Luminosity
 1     
 1
 1
 4
 4
 6
 6

I need to be able to input 5, and return values 4 and 6. I am struggling to find a way to do this? This is what I tried :

def findnearest(array,value):
    idx = np.searchsorted(array,value, side='left')
    if idx > 125893.0:
        return array[idx]
    else:
        return array[idx]
    idx1 = np.searchsorted(array,value, side='right')
    if idx1 < 2e10:
        return array[idx1]
    else:
        return array [idx1-1]

C = findnearest(m05_010['age'], 5.12e7)

print(C)

This only returns one value, and not both. Is this the right path or should I be doing something different? Is there a better way?


Solution

  • I think you should use bisect, its a lot faster and is made for this purpose only.

    from bisect import *
    arr = np.array([1,1,1,4,4,4,4,4,4,4,6,6])
    
    value = 5
    lower = arr[bisect_left(arr, value) - 1]
    above = arr[bisect_right(arr, value)]
    lower, above
    

    Output -

    (4, 6)
    

    Heres the time comparison from Ipython -

    %timeit for x in range(100): arr[bisect_left(arr, value)]
    

    Output -

    10000 loops, best of 3: 92.4 µs per loop

    And using searchsorted -

    %timeit for x in range(100): arr[np.searchsorted(arr,value,'left')-1]
    

    Output -

    The slowest run took 7.62 times longer than the fastest. This could
    mean that an intermediate result is being cached. 10000 loops, best of 3: 142 µs per loop