Search code examples
pythonpandasdataframepandas-groupby

using nested groupby commands to transform column in pandas


I have the following simple dataframe.

data = [['Cntrl', 0.844961, '201030'],
        ['Cntrl', 0.815331, '201030'],
        ['Exp', 1.530297, '201030'],
        ['Exp', 2.36416, '201030'],
        ['Cntrl', 1.867359, '201024'],
        ['Cntrl', 1.274295, '201024'],
        ['Exp', 0.7058, '201024'],
        ['Exp', 2.657393, '201024'],]

df = pd.DataFrame(data, columns=['Group', 'Value', 'Date'])
df

    Group   Value       Date
0   Cntrl   0.844961    201030
1   Cntrl   0.815331    201030
2   Exp     1.530297    201030
3   Exp     2.364160    201030
4   Cntrl   1.867359    201024
5   Cntrl   0.705800    201024
6   Exp     1.274295    201024
7   Exp     2.657393    201024

I would like to divide the values for each date by the minimum Cntrl value for that date, producing:


    Group   Value   Date
0   Cntrl   1.0363  201030
1   Cntrl   1.0000  201030
2   Exp     1.8769  201030
3   Exp     2.8996  201030
4   Cntrl   1.4654  201024
5   Cntrl   1.0000  201024
6   Exp     0.5538  201024
7   Exp     2.0853  201024

I gather that I can divide all values from each date by the minimum of all all values for that date using: df['Value'] = df.groupby('Date')['Value'].transform(lambda x: x / x.min())

However, this lumps both group values together instead of specifically using the Cntrl group to calculate the minimum. Is there a way to nest an additional groupby call into the lambda function to, or set up the transform differently, to accomplish this?


Solution

  • Doing groupby with reindex

    df.Value /= df[df['Group'].eq('Cntrl')].groupby('Date')['Value'].min().reindex(df['Date']).values
    df
    Out[172]: 
       Group     Value    Date
    0  Cntrl  1.036341  201030
    1  Cntrl  1.000000  201030
    2    Exp  1.876903  201030
    3    Exp  2.899632  201030
    4  Cntrl  1.465406  201024
    5  Cntrl  1.000000  201024
    6    Exp  0.553875  201024
    7    Exp  2.085383  201024