Search code examples
pythonpandasaggregatehistogramfrequency

Pandas/python analyse frequency histogram by a column and aggregate another column to those buckets


I am using pandas with a dataframe like below:

Name percent Amount
A 3 34
B 5 200
C 30 20
D 1 12

I want to create buckets for the percent column such as 0-5, 6-15, >16. With these buckets I record both the count of percent column (which is effectively a histogram) but also record the average of Amount in that same bucket.

Using the above example:

Bucket percent count Avg. Amount
5 3 82
15 0 0
>15 1 20

How can I achieve this in python and pandas (or any other library)


Solution

  • You need to use pandas.cut and groupby+agg:

    (df.assign(Bucket=pd.cut(df['percent '],
                             [0, 5, 15, float('inf')],
                             labels=['0-5', '6-15', '>15']))
       .groupby('Bucket').agg(**{'percent count': ('percent ', 'count'),
                                'Avg. Amount': ('Amount', 'mean')
                               })
       .fillna(0, downcast='infer')
       .reset_index()
    )
    

    output:

      Bucket  percent count  Avg. Amount
    0    0-5              3           82
    1   6-15              0            0
    2    >15              1           20