Search code examples
pythonpandasgroup-bycumsum

Pandas: conditional, groupby, cumsum


I have dataframe where i'm trying to create a new column showing the value with conditional groupby.

conditions:

  1. tag == 1, profit - cost
  2. tag > 1, -(cost)
  3. net is summed after every iteration

original df:

╔══════╦═════╦══════╦════════╗
║ rep  ║ tag ║ cost ║ profit ║
╠══════╬═════╬══════╬════════╣
║ john ║   1 ║   1  ║      5 ║
║ pete ║   2 ║   1  ║      3 ║
║ pete ║   3 ║   1  ║      4 ║
║ pete ║   4 ║   1  ║      5 ║
║ john ║   5 ║   1  ║      7 ║
║ john ║   1 ║   1  ║      9 ║
║ pete ║   1 ║   1  ║      3 ║
║ john ║   3 ║   1  ║      5 ║
╚══════╩═════╩══════╩════════╝

output hope to get:

╔══════╦═════╦══════╦════════╦═════╗
║ rep  ║ tag ║ cost ║ profit ║ net ║
╠══════╬═════╬══════╬════════╬═════╣
║ john ║   1 ║   1  ║      5 ║   4 ║
║ pete ║   2 ║   1  ║      3 ║  -1 ║
║ pete ║   3 ║   1  ║      4 ║  -2 ║
║ pete ║   4 ║   1  ║      5 ║  -3 ║
║ john ║   5 ║   1  ║      7 ║   3 ║
║ john ║   1 ║   1  ║      9 ║  11 ║
║ pete ║   1 ║   1  ║      3 ║  -1 ║
║ john ║   3 ║   1  ║      5 ║  15 ║
╚══════╩═════╩══════╩════════╩═════╝

I'm able to use loc to derives 'if' conditions, however, i'm not able to figure out or find examples of groupby/if/cumsum for this.

sample df code:

data = {
    "rep": ['john','pete','pete','pete','john','john','pete','john'],
    "tag": [1,2,3,4,5,1,1,3],
    "cost": [1,1,1,1,1,1,1,1],
    "profit": [5,3,4,5,7,9,3,5]}

df = pd.DataFrame(data)

kindly advise


Solution

  • cond1 = df['tag'].eq(1)
    s = df['cost'].mul(-1).mask(cond1, df['profit'] - df['cost'])
    

    s

    0    4
    1   -1
    2   -1
    3   -1
    4   -1
    5    8
    6    2
    7   -1
    Name: cost, dtype: int64
    

    df['net'] = s.groupby(df['rep']).cumsum()
    

    df

        rep    tag  cost    profit  net
    0   john    1   1       5       4
    1   pete    2   1       3       -1
    2   pete    3   1       4       -2
    3   pete    4   1       5       -3
    4   john    5   1       7       3
    5   john    1   1       9       11
    6   pete    1   1       3       -1
    7   john    3   1       5       10
    

    Other Way

    if you want 1-line code use following

    out = (df.assign(net=df['cost'].mul(-1)
                     .mask(df['tag'].eq(1), df['profit'].sub(df['cost']))
                     .groupby(df['rep']).cumsum()))
    

    same result