Search code examples
pythonpandascumsum

Cumsum Reset based on a condition in Pandas


My question is very similar to Cumsum within group and reset on condition in pandas and Pandas: cumsum per category based on additional condition but they don't quite get me there due to my conditional requirements. I have a data frame that looks like this:

  TransactionId     Delta
          14          2
          14          3
          14          1
          14          2
          15          4
          15          2
          15          3

I want to create another column "Cumulative" that calculates the cumsum of Delta for each TransactionId. So the result would look like this:

  TransactionId     Delta    Cumulative
          14          2          2
          14          3          5
          14          1          6
          14          2          8
          15          4          4
          15          2          6
          15          3          9

I have the condition for checking TransactionId equality setup:

c1 = df.TransactionId.eq(df.TransactionId.shift())

But I can't figure out how to add the Delta value to the previous Cumulative row.


Solution

  • Use groupby.cumsum:

    df['Cumulative'] = df.groupby('TransactionId')['Delta'].cumsum()
    
    print (df)
    
      TransactionId  Delta  Cumulative
    0       14         2       2
    1       14         3       5
    2       14         1       6
    3       14         2       8
    4       15         4       4 
    5       15         2       6 
    6       15         3       9