Search code examples
pythonpandaslambdaapplysubtraction

Groupby two columns and create a new column based on a conditional subtraction in python


I'm trying to create a new column based on a conditional subtraction in python. I want to first group the dataframe by column A and D, then take the row value of C where B equals 2, and subtract that value from all values in column C.

import pandas as pd

data = [
["R", 1, 2, "p"],
["R", 2, 4, "p"],
["R", 3, 6, "p"],
["R", 4, 8, "p"],
["R", 1, 6, "o"],
["R", 2, 3, "o"],
["R", 3, 1, "o"],
["R", 4, 2, "o"],
["S", 0, 5, "n"],
["S", 1, 4, "n"],
["S", 2, 1, "n"],
["S", 3, 3, "n"],
["S", 0, 3, "g"],
["S", 1, 2, "g"],
["S", 2, 9, "g"],
["S", 3, 7, "g"]]

df = pd.DataFrame(data=data, columns=["a", "b", "c", "d"])
df

Out[1]:
    a   b   c   d
0   R   1   2   p
1   R   2   4   p
2   R   3   6   p
3   R   4   8   p
4   R   1   6   o
5   R   2   3   o
6   R   3   1   o
7   R   4   2   o
8   S   0   5   n
9   S   1   4   n
10  S   2   1   n
11  S   3   3   n
12  S   0   3   g
13  S   1   2   g
14  S   2   9   g
15  S   3   7   g

Want it to yield the column 'e' of:

Out[2]:
    a   b   c   d   e 
0   R   1   2   p  -2
1   R   2   4   p   0 
2   R   3   6   p   2
3   R   4   8   p   4
4   R   1   6   o   3
5   R   2   3   o   0
6   R   3   1   o  -2
7   R   4   2   o  -1
8   S   0   5   n   4
9   S   1   4   n   3
10  S   2   1   n   0
11  S   3   3   n   2
12  S   0   3   g  -6
13  S   1   2   g  -7
14  S   2   9   g   0
15  S   3   7   g  -2

I would appreciate any tips of how to use transform or map functions to solve the problem.


Solution

  • IIUC, you can use a mask before using groupby.transform('first'):

    df['e'] = df['c'] - (df['c'].where(df['b'].eq(2))
                                .groupby([df['a'], df['d']])
                                .transform('first')
                                .convert_dtypes()
                        )
    

    output:

        a  b  c  d   e
    0   R  1  2  p  -2
    1   R  2  4  p   0
    2   R  3  6  p   2
    3   R  4  8  p   4
    4   R  1  6  o   3
    5   R  2  3  o   0
    6   R  3  1  o  -2
    7   R  4  2  o  -1
    8   S  0  5  n   4
    9   S  1  4  n   3
    10  S  2  1  n   0
    11  S  3  3  n   2
    12  S  0  3  g  -6
    13  S  1  2  g  -7
    14  S  2  9  g   0
    15  S  3  7  g  -2