Search code examples
pythonpandasnumpyaggregate

Pass percentiles to pandas agg function


I want to pass the numpy percentile() function through pandas' agg() function as I do below with various other numpy statistics functions.

Right now I have a dataframe that looks like this:

AGGREGATE   MY_COLUMN
A           10
A           12
B           5
B           9
A           84
B           22

And my code looks like this:

grouped = dataframe.groupby('AGGREGATE')
column = grouped['MY_COLUMN']
column.agg([np.sum, np.mean, np.std, np.median, np.var, np.min, np.max])

The above code works, but I want to do something like

column.agg([np.sum, np.mean, np.percentile(50), np.percentile(95)])

I.e., specify various percentiles to return from agg().

How should this be done?


Solution

  • Perhaps not super efficient, but one way would be to create a function yourself:

    def percentile(n):
        def percentile_(x):
            return x.quantile(n)
        percentile_.__name__ = 'percentile_{:02.0f}'.format(n*100)
        return percentile_
    

    Then include this in your agg:

    In [11]: column.agg([np.sum, np.mean, np.std, np.median,
                         np.var, np.min, np.max, percentile(50), percentile(95)])
    Out[11]:
               sum       mean        std  median          var  amin  amax  percentile_50  percentile_95
    AGGREGATE
    A          106  35.333333  42.158431      12  1777.333333    10    84             12           76.8
    B           36  12.000000   8.888194       9    79.000000     5    22             12           76.8
    

    Note sure this is how it should be done though...