For a Pandas Series:
ser = pd.Series([i**2 for i in range(9)])
print(ser)
0 0
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
dtype: int64
The median can be grabbed with ser.median()
, which returns 16
. How can the N entries around the median be grabbed? Something like:
print(ser.get_median_entries(3)) # N == 3; not real functionality
3 9
4 16
5 25
dtype: int64
You can find the abs difference between each value and the median and use sort_values()
:
ser[abs(ser - ser.median()).sort_values()[0:3].index]
#4 16
#3 9
#5 25
#dtype: int64
If you want it as a function, where n
is an input variable:
def get_n_closest_to_median(ser, n):
return ser[abs(ser - ser.median()).sort_values()[0:n].index]
print get_n_closest_to_median(ser, 3)
#4 16
#3 9
#5 25
#dtype: int64
You will probably have to add some error checking on the bounds.