Search code examples
pythonpandasseriesmedian

How to get the N nearest entries to the median in a Pandas series?


For a Pandas Series:

ser = pd.Series([i**2 for i in range(9)])
print(ser)
0     0
1     1
2     4
3     9
4    16
5    25
6    36
7    49
8    64
dtype: int64

The median can be grabbed with ser.median(), which returns 16. How can the N entries around the median be grabbed? Something like:

print(ser.get_median_entries(3)) # N == 3; not real functionality
3     9
4    16
5    25
dtype: int64

Solution

  • You can find the abs difference between each value and the median and use sort_values():

    ser[abs(ser - ser.median()).sort_values()[0:3].index]
    #4    16
    #3     9
    #5    25
    #dtype: int64
    

    If you want it as a function, where n is an input variable:

    def get_n_closest_to_median(ser, n):
        return ser[abs(ser - ser.median()).sort_values()[0:n].index]
    
    print get_n_closest_to_median(ser, 3)
    #4    16
    #3     9
    #5    25
    #dtype: int64
    

    You will probably have to add some error checking on the bounds.