Search code examples
pythonpandasgroup-byunique

How to create new column based on an equation and unique values from other column?


Imagine I have a dataframe like this one below. I would like to create a new column df['b'] with a specific equation that takes the maximum and minimum values of df['a']. The equation should be something like this:

import pandas as pd
df = pd.DataFrame({'a':[0.3, 0.1, 0.7, 0.5, 0.4, 0.3, 0.1, 0.6, 0.8, 0.2, 0.2],
                   'group':[1, 1, 3, 3, 5, 5, 3, 3, 6, 6, 1]})

equation = (df['a'] - df['a'].min()) / (df['a'].max() - df['a'].min())

Although, these maximum and minimum values should be related to the unique values in df['group']. So, we should get the max and min values for group 1, 3, 5 and 6 and then apply the equation on the related row of df['a'].

I managed to separate these values, but I don't know how to reproduce this idea.

a_max = df.groupby('group')['a'].max()
a_min = df.groupby('group')['a'].min()

The output should look like this:

     a     group   b
0   0.3      1     1
1   0.1      1     0
2   0.7      3     1
3   0.5      3     0.67
4   0.4      5     1
5   0.3      5     0
6   0.1      3     0
7   0.6      3     0.6
8   0.8      6     1
9   0.2      6     0
10  0.2      1     0.5

Solution

  • We can precompute the max/min by group:

    groups = df.groupby('group')['a']
    amax, amin = groups.transform('max'), groups.transform('min')
    
    df['b'] = (df['a']-amin)/(amax-amin)
    

    Or use a custom function:

    df['b'] = df.groupby('group')['a'].apply(lambda x: (x-x.min())/(x.max()-x.min()) )
    

    The first approach is slightly more performant, while the second is shorter in terms of code.

    Both would output:

          a  group         b
    0   0.3      1  1.000000
    1   0.1      1  0.000000
    2   0.7      3  1.000000
    3   0.5      3  0.666667
    4   0.4      5  1.000000
    5   0.3      5  0.000000
    6   0.1      3  0.000000
    7   0.6      3  0.833333
    8   0.8      6  1.000000
    9   0.2      6  0.000000
    10  0.2      1  0.500000