Search code examples
pythonpandassum

Cumulative Sum that resets based on specific condition


Let's say I have the following data:

df=pd.DataFrame({'Days':[1,2,3,4,1,2,3,4],
                'Flag':["First","First","First","First","Second","Second","Second","Second"],
                'Payments':[1,2,3,4,9,3,1,6]})

I want to create a cumulative sum for payments, but it has to reset when flag turns from first to second. Any help?

The output that I'm looking for is the following: enter image description here


Solution

  • You can use df['Flag'].ne(df['Flag'].shift()).cumsum() to generate a grouper that will group by changes in the Flag column. Then, group by that, and cumsum:

    df['cumsum'] = df['Payments'].groupby(df['Flag'].ne(df['Flag'].shift()).cumsum()).cumsum()
    

    Output:

    >>> df
       Days    Flag  Payments  cumsum
    0     1   First         1       1
    1     2   First         2       3
    2     3   First         3       6
    3     4   First         4      10
    4     1  Second         9       9
    5     2  Second         3      12
    6     3  Second         1      13
    7     4  Second         6      19