Search code examples
pythondataframecumsum

Conditional Cumsum of one dataframe, dependent on a different dataframe


I have two data frames that are exactly the same size, and I want to cumsum everything to the left of a specific cell on one dataframe assuming the value is the same in the other. If the value changes in the other dataframe, I would want to reset and set that as the endpoint to stop summing from the dataframe.

Summing Table:

Summing 01-Jan 02-Jan 03-Jan
X 100 100 100
Y 250 100 150
Z 300 300 300

Conditional Table:

Condition 01-Jan 02-Jan 03-Jan
X 0 0 0
Y 1 2 2
Z 1 2 3

Expected output:

Expected 01-Jan 02-Jan 03-Jan
X 100 200 300
Y 250 100 250
Z 300 300 300

Tried original cumsum function and it didn't work really well, trying to implement a new definition to possibly mark the places where the cumsum stops. Was thinking I might be able to use melt to find the changing points but unsure how that would benefit.


Solution

  • Assuming df1 and df2 your inputs, you can temporarily reshape your DataFrame to run a groupby.cumsum:

    out = (df1
     .set_index('Summing').stack()
     .groupby(['Summing', df2.set_index('Condition').stack()])
     .cumsum()
     .unstack().reset_index()
    )
    

    Output:

      Summing  01-Jan  02-Jan  03-Jan
    0       X     100     200     300
    1       Y     250     100     250
    2       Z     300     300     300