I am using pandas with a dataframe like below:
Name | percent | Amount |
---|---|---|
A | 3 | 34 |
B | 5 | 200 |
C | 30 | 20 |
D | 1 | 12 |
I want to create buckets for the percent
column such as 0-5
, 6-15
, >16
. With these buckets I record both the count of percent
column (which is effectively a histogram) but also record the average of Amount
in that same bucket.
Using the above example:
Bucket | percent count | Avg. Amount |
---|---|---|
5 | 3 | 82 |
15 | 0 | 0 |
>15 | 1 | 20 |
How can I achieve this in python
and pandas
(or any other library)
You need to use pandas.cut
and groupby
+agg
:
(df.assign(Bucket=pd.cut(df['percent '],
[0, 5, 15, float('inf')],
labels=['0-5', '6-15', '>15']))
.groupby('Bucket').agg(**{'percent count': ('percent ', 'count'),
'Avg. Amount': ('Amount', 'mean')
})
.fillna(0, downcast='infer')
.reset_index()
)
output:
Bucket percent count Avg. Amount
0 0-5 3 82
1 6-15 0 0
2 >15 1 20