Search code examples
pythonpandasdataframegroupingfrequency

Grouping age in Pandas (Python)


I have a DataFrame with column "ages" and column "professional qualification", like this:

ages professional qualification
45 labourer
49 labourer
29 labourer
61 labourer
45 labourer
37 labourer
17 office worker
56 labourer
47 office worker

I want to group the ages like this ( ,17), (17,29), (30,40), (40,50), (50, ) and, with these ages grouped I would to create a frequency table indicating on each age group what professional qualification appears more often.

Example:

ages professional qualification
(,17) office worker
(17,29) labourer
(30,40) labourer
(40,50) labourer

etc, etc, etc. The people who have an age between 40 and 50 (excluding 40) are mostly labourers

All solutions will be appreciated.


Solution

  • Use cut with aggregate by GroupBy.agg custom function by Series.mode with select first element:

    bins = [0,17,29,40,50,70,100]
    
    f = lambda x: x.mode().iat[0]
    df1 = (df.groupby(pd.cut(df['ages'], bins=bins))['professional qualification']
             .agg(f)
             .reset_index())
    print (df1)
            ages professional qualification
    0    (0, 17]              office worker
    1   (17, 29]                   labourer
    2   (29, 40]                   labourer
    3   (40, 50]                   labourer
    4   (50, 70]                   labourer
    5  (70, 100]                       None