I have a DataFrame with column "ages" and column "professional qualification", like this:
ages | professional qualification |
---|---|
45 | labourer |
49 | labourer |
29 | labourer |
61 | labourer |
45 | labourer |
37 | labourer |
17 | office worker |
56 | labourer |
47 | office worker |
I want to group the ages like this ( ,17), (17,29), (30,40), (40,50), (50, ) and, with these ages grouped I would to create a frequency table indicating on each age group what professional qualification appears more often.
Example:
ages | professional qualification |
---|---|
(,17) | office worker |
(17,29) | labourer |
(30,40) | labourer |
(40,50) | labourer |
etc, etc, etc. The people who have an age between 40 and 50 (excluding 40) are mostly labourers
All solutions will be appreciated.
Use cut
with aggregate by GroupBy.agg
custom function by Series.mode
with select first element:
bins = [0,17,29,40,50,70,100]
f = lambda x: x.mode().iat[0]
df1 = (df.groupby(pd.cut(df['ages'], bins=bins))['professional qualification']
.agg(f)
.reset_index())
print (df1)
ages professional qualification
0 (0, 17] office worker
1 (17, 29] labourer
2 (29, 40] labourer
3 (40, 50] labourer
4 (50, 70] labourer
5 (70, 100] None