I am trying to find the median of values within a bin range generated by the np.histrogram
function. How would I select the values only within the bin range and operate on those specific values? Below is an example of my data and what I am trying to do:
x = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
y values can have any sort of x value associated with them, for example:
hist, bins = np.histogram(x)
hist = [129, 126, 94, 133, 179, 206, 142, 147, 90, 185]
bins = [0., 0.09999926, 0.19999853, 0.29999779, 0.39999706,
0.49999632, 0.59999559, 0.69999485, 0.79999412, 0.8999933,
0.99999265]
So, I am trying to find the median y value of the 129 values in the first bin generated, etc.
One way is with pandas.cut()
:
>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(444)
>>> x = np.random.randint(0, 25, size=100)
>>> _, bins = np.histogram(x)
>>> pd.Series(x).groupby(pd.cut(x, bins)).median()
(0.0, 2.4] 2.0
(2.4, 4.8] 3.0
(4.8, 7.2] 6.0
(7.2, 9.6] 8.5
(9.6, 12.0] 10.5
(12.0, 14.4] 13.0
(14.4, 16.8] 15.5
(16.8, 19.2] 18.0
(19.2, 21.6] 20.5
(21.6, 24.0] 23.0
dtype: float64
If you want to stay in NumPy, you might want to check out np.digitize()
.