I have two data frames that are exactly the same size, and I want to cumsum everything to the left of a specific cell on one dataframe assuming the value is the same in the other. If the value changes in the other dataframe, I would want to reset and set that as the endpoint to stop summing from the dataframe.
Summing Table:
Summing | 01-Jan | 02-Jan | 03-Jan |
---|---|---|---|
X | 100 | 100 | 100 |
Y | 250 | 100 | 150 |
Z | 300 | 300 | 300 |
Conditional Table:
Condition | 01-Jan | 02-Jan | 03-Jan |
---|---|---|---|
X | 0 | 0 | 0 |
Y | 1 | 2 | 2 |
Z | 1 | 2 | 3 |
Expected output:
Expected | 01-Jan | 02-Jan | 03-Jan |
---|---|---|---|
X | 100 | 200 | 300 |
Y | 250 | 100 | 250 |
Z | 300 | 300 | 300 |
Tried original cumsum function and it didn't work really well, trying to implement a new definition to possibly mark the places where the cumsum stops. Was thinking I might be able to use melt to find the changing points but unsure how that would benefit.
Assuming df1
and df2
your inputs, you can temporarily reshape your DataFrame to run a groupby.cumsum
:
out = (df1
.set_index('Summing').stack()
.groupby(['Summing', df2.set_index('Condition').stack()])
.cumsum()
.unstack().reset_index()
)
Output:
Summing 01-Jan 02-Jan 03-Jan
0 X 100 200 300
1 Y 250 100 250
2 Z 300 300 300