Search code examples
pythonpandasgroup-byconditional-statementscumsum

pandas - increment cumulative sum only when string is matched


I am looking to cumulative values under a few conditional situations: I only want to add the value if Condition == New, and group the cumulative values by name. Say I have the data frame, df:

Name | Condition | Value 
A    | New       | 5     
A    | Not New   | 7     
A    | New       | 15    
B    | Old       | 20    
B    | New       | 14    
B    | Old       | 3     

I am looking for the output:

Name | Condition | Value | Cumulative Value
A    | New       | 5     | 5
A    | Not New   | 7     | 5
A    | New       | 15    | 20
B    | Old       | 20    | 0
B    | New       | 14    | 14
B    | Old       | 3     | 14

For each entry where Condition == New, I add the current Value to the current Cumulative Value. Else, the Cumulative Value should be maintained and unchanged. And when a new Name comes in, I reset the Cumulative Value to zero if the first entry for Condition is 'Old', or whatever the first Value is if the first entry for Condition is 'New'.

I am stuck with the code

df['Cumulative Value'] = df.groupby(['Name'])['Value'].cumsum()

which is grouping the Name and calculating all the cumulative sums. Not sure how to move forward with adding an extra condition to check the string for the Condition column and keep Cumulative Value unchanged if condition does not match.


Solution

  • You can replace Value where Condition is not equal to New with 0 and then do cumulative sum:

    df['Cumulative Value'] = df.Value.where(df.Condition == 'New', 0).groupby(df.Name).cumsum()
    
    df
      Name Condition  Value  Cumulative Value
    0    A       New      5                 5
    1    A   Not New      7                 5
    2    A       New     15                20
    3    B       Old     20                 0
    4    B       New     14                14
    5    B       Old      3                14