Search code examples
pythonpandasmax

get max value after grouping two columns pandas


I want to get the trip_time with the highest count for each age_group

age_group trip_time counts
18 - 30yrs 01am 23
18 - 30yrs 02am 2
18 - 30yrs 03am 213
31 - 50yrs 01am 74
31 - 50yrs 02am 211
31 - 50yrs 03am 852
51 - 70yrs 01am 23
51 - 70yrs 02am 11
51 - 70yrs 03am 101

Expected output:

age_group trip_time counts
18 - 30yrs 03am 213
31 - 50yrs 03am 852
51 - 70yrs 03am 101
trip_time_age_group.groupby(['age_group', 'trip_time'])['counts'].max()

But it gives me wrong result


Solution

  • groupby 'age-group' and then use transform to get the max count for each age-group, which you then compare with the count of the DF, to get the resultset

    df.loc[df.groupby('age_group')['counts'].transform('max').eq(df['counts'])]
    
        age_group trip_time  counts
    2  18 - 30yrs      03am     213
    5  31 - 50yrs      03am     852
    8  51 - 70yrs      03am     101