Suppose I have a dataframe like:
Group Values
0 1 1
1 1 4
2 1 2
3 1 7
4 1 3
5 2 4
6 2 1
7 2 5
8 2 12
9 2 4
10 2 10
11 3 2
12 3 6
13 3 20
14 3 15
MRE:
df = pd.DataFrame({'Group': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3], 'Values': [1, 4, 2, 7, 3, 4, 1, 5, 12, 4, 10, 2, 6, 20, 15]})
Notice in the dataframe that the group 1 has max value of 7, while group 2 has 12 and group 3 has 20. Now I want to normalize each Group
's Values
to a cap value of 10.
I tried to use pd.groupby
method but I'm left with no clue on how to proceed further. Plus I know I could use a for
loop but this would be very inefficient as I have about 20k samples in the data I'm trying to work with.
Is there a sweet and subtle method which can do the trick?
You can use groupby
with transform
for this. There's a similar example in the docs of transform()
:
import pandas as pd
df = pd.DataFrame({'Group': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3], 'Values': [1, 4, 2, 7, 3, 4, 1, 5, 12, 4, 10, 2, 6, 20, 15]})
df['normal'] = df.groupby('Group').transform(lambda x: (x / x.max()) * 10)
print(df)
Prints:
Group Values normal
0 1 1 1.428571
1 1 4 5.714286
2 1 2 2.857143
3 1 7 10.000000
4 1 3 4.285714
5 2 4 3.333333
6 2 1 0.833333
7 2 5 4.166667
8 2 12 10.000000
9 2 4 3.333333
10 2 10 8.333333
11 3 2 1.000000
12 3 6 3.000000
13 3 20 10.000000
14 3 15 7.500000