I want to pass the numpy percentile()
function through pandas' agg()
function as I do below with various other numpy statistics functions.
Right now I have a dataframe that looks like this:
AGGREGATE MY_COLUMN
A 10
A 12
B 5
B 9
A 84
B 22
And my code looks like this:
grouped = dataframe.groupby('AGGREGATE')
column = grouped['MY_COLUMN']
column.agg([np.sum, np.mean, np.std, np.median, np.var, np.min, np.max])
The above code works, but I want to do something like
column.agg([np.sum, np.mean, np.percentile(50), np.percentile(95)])
I.e., specify various percentiles to return from agg()
.
How should this be done?
Perhaps not super efficient, but one way would be to create a function yourself:
def percentile(n):
def percentile_(x):
return x.quantile(n)
percentile_.__name__ = 'percentile_{:02.0f}'.format(n*100)
return percentile_
Then include this in your agg
:
In [11]: column.agg([np.sum, np.mean, np.std, np.median,
np.var, np.min, np.max, percentile(50), percentile(95)])
Out[11]:
sum mean std median var amin amax percentile_50 percentile_95
AGGREGATE
A 106 35.333333 42.158431 12 1777.333333 10 84 12 76.8
B 36 12.000000 8.888194 9 79.000000 5 22 12 76.8
Note sure this is how it should be done though...