Search code examples
pythonpandasassign

Pandas - new column based on `max` of grouped values


I have a Pandas dataframe with multiple groups in it, A, B, C. Each group has multiple counts associated with it and I want to create a new column that is normalised to the max value of each group.

i.e.

index, group, year, count
0, A, 2015, 1
1, A, 2016, 2
2, A, 2017, 3
3, B, 2012, 10
4, B, 2013, 14
5, B, 2014, 18
6, C, 2014, 55
7, C, 2015, 59
8, C, 2016, 58

...becomes

index, group, year, count, normalised
0, A, 2015, 1,  0.333
1, A, 2016, 2,  0.667
2, A, 2017, 3,  1.000
3, B, 2012, 10, 0.557
4, B, 2013, 14, 0.778
5, B, 2014, 18, 1.000
6, C, 2014, 55, 0.932
7, C, 2015, 59, 1.000
8, C, 2016, 58, 0.983

If I try something like...

df.assign(normalised=lambda x: x['count']/df[df['group'] == x['group']]['count'].max()

then max will return 59 rather than the largest number within the category


Solution

  • You can use groupby + transform to calculate the ratio between current value and maximum value in each group:

    df['normalised'] = df['count'].groupby(df.group).transform(lambda x: x / x.max())
    
    df
       index group  year  count  normalised
    0      0     A  2015      1    0.333333
    1      1     A  2016      2    0.666667
    2      2     A  2017      3    1.000000
    3      3     B  2012     10    0.555556
    4      4     B  2013     14    0.777778
    5      5     B  2014     18    1.000000
    6      6     C  2014     55    0.932203
    7      7     C  2015     59    1.000000
    8      8     C  2016     58    0.983051