Search code examples
pythonpandasgroup-bytransform

Subtract value from particular row using groupby transform


Have a dataframe containg several groups (column Id). Within each group there are several levels (column Level). All groups have a level named 'Base'. For each group I want to subtract the 'Base' value from the value at all the other levels.

Using pandas.join and a little back and forth I am able to get what I want.

import pandas as pd

df = pd.DataFrame({'Id':['A', 'A', 'A', 'B', 'B', 'B'],
                   'Level':['Down', 'Base', 'Up', 'Base', 'Down', 'Up'],
                   'Value':[8, 10, 15, 6, 3, 8]
                   }).set_index('Id')

df = df.join(df[df['Level']=='Base']['Value'], rsuffix='_Base')
df['Delta'] = df['Value'] - df['Value_Base']
df.drop('Value_Base', inplace=True, axis=1)

#The input
df_in
Out[3]: 
   Level  Value
Id             
A   Down      8
A   Base     10
A     Up     15
B   Base      6
B   Down      3
B     Up      8

# The output after the above operation (and hopefully after a groupby.transform)
df_out
Out[4]: 
   Level  Value  Delta
Id                    
A   Down      8     -2
A   Base     10      0
A     Up     15      5
B   Base      6      0
B   Down      3     -3
B     Up      8      2

The above solution is not too bad I guess, but I was hoping the same result could be achieved using groupby and transform. I have tried

df_in.groupby('Id').transform(lambda x : x['Value'] - x[x['Level']=='Base']['Value'])

but that did not work. Can anybody tell me what I am doing wrong?


Solution

  • If really need transform and always Base for each group one possible solution is create MultiIndex and then select by xs:

    df['Delta'] =df['Value'] - (df.set_index('Level', append=True)
                                  .groupby(level=0)['Value']
                                  .transform(lambda x:  x.xs('Base', level=1)[0])
                                  .values)
    print (df)
       Level  Value  Delta
    Id                    
    A   Down      8     -2
    A   Base     10      0
    A     Up     15      5
    B   Base      6      0
    B   Down      3     -3
    B     Up      8      2
    

    Similar solution working also if some Base not exist for group:

    f = lambda x:  next(iter(x.xs('Base', level=1)), np.nan)
    df = df.set_index('Level', append=True)
    df['Delta']  = df['Value'] - df.groupby(level=0)['Value'].transform(f)
    df = df.reset_index(level=1)                           
    print (df)
       Level  Value  Delta
    Id                    
    A   Down      8     -2
    A   Base     10      0
    A     Up     15      5
    B   Base      6      0
    B   Down      3     -3
    B     Up      8      2
    

    Better solution is:

    df['Delta'] = df['Value'] - df.index.map(df.loc[df['Level'].eq('Base'), 'Value'])
    print (df)
       Level  Value  Delta
    Id                    
    A   Down      8     -2
    A   Base     10      0
    A     Up     15      5
    B   Base      6      0
    B   Down      3     -3
    B     Up      8      2