I have a Pandas dataframe with multiple groups in it, A, B, C. Each group has multiple counts associated with it and I want to create a new column that is normalised to the max value of each group.
i.e.
index, group, year, count
0, A, 2015, 1
1, A, 2016, 2
2, A, 2017, 3
3, B, 2012, 10
4, B, 2013, 14
5, B, 2014, 18
6, C, 2014, 55
7, C, 2015, 59
8, C, 2016, 58
...becomes
index, group, year, count, normalised
0, A, 2015, 1, 0.333
1, A, 2016, 2, 0.667
2, A, 2017, 3, 1.000
3, B, 2012, 10, 0.557
4, B, 2013, 14, 0.778
5, B, 2014, 18, 1.000
6, C, 2014, 55, 0.932
7, C, 2015, 59, 1.000
8, C, 2016, 58, 0.983
If I try something like...
df.assign(normalised=lambda x: x['count']/df[df['group'] == x['group']]['count'].max()
then max
will return 59
rather than the largest number within the category
You can use groupby
+ transform
to calculate the ratio between current value and maximum value in each group:
df['normalised'] = df['count'].groupby(df.group).transform(lambda x: x / x.max())
df
index group year count normalised
0 0 A 2015 1 0.333333
1 1 A 2016 2 0.666667
2 2 A 2017 3 1.000000
3 3 B 2012 10 0.555556
4 4 B 2013 14 0.777778
5 5 B 2014 18 1.000000
6 6 C 2014 55 0.932203
7 7 C 2015 59 1.000000
8 8 C 2016 58 0.983051