python pandas aggregate histogram frequency

Pandas/python analyse frequency histogram by a column and aggregate another column to those buckets

I am using pandas with a dataframe like below:

Name	percent	Amount
A	3	34
B	5	200
C	30	20
D	1	12

I want to create buckets for the percent column such as 0-5, 6-15, >16. With these buckets I record both the count of percent column (which is effectively a histogram) but also record the average of Amount in that same bucket.

Using the above example:

Bucket	percent count	Avg. Amount
5	3	82
15	0	0
>15	1	20

How can I achieve this in python and pandas (or any other library)

Solution

You need to use pandas.cut and groupby+agg:

(df.assign(Bucket=pd.cut(df['percent '],
                         [0, 5, 15, float('inf')],
                         labels=['0-5', '6-15', '>15']))
   .groupby('Bucket').agg(**{'percent count': ('percent ', 'count'),
                            'Avg. Amount': ('Amount', 'mean')
                           })
   .fillna(0, downcast='infer')
   .reset_index()
)

output:

  Bucket  percent count  Avg. Amount
0    0-5              3           82
1   6-15              0            0
2    >15              1           20