I am trying to do cumulative sum by intervals ie. with cumsum being reset to zero if the next value to accumulate is 0. Below is an example with the desired result following. I have tried using numpy 'convolve' and 'groupby' but can't get come up with a way to do the reset except by creating a def that loops over all the rows. Is there a clever approach I'm missing? Note that the real data in column 'x' are real numbers separated by 0's.
import numpy as np
import pandas as pd
a = pd.DataFrame([[0,0],[1,0],[1,0],[1,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],\
[0,0],[0,0],[0,0],[0,0],[1,0],[1,0],[0,0]], columns=["x","y"])
def patch(k):
k["z"] = k.x.cumsum()
return k
print(patch(a))
Current output:
x y z
0 0 0 0
1 1 0 1
2 1 0 2
3 1 0 3
4 0 0 3
6 0 0 3
7 0 0 3
9 0 0 3
10 0 0 3
12 0 0 3
13 1 0 4
15 1 0 5
16 0 0 5
Desired output:
x y z
0 0 0 0
1 1 0 1
2 1 0 2
3 1 0 3
4 0 0 0
6 0 0 0
7 0 0 0
9 0 0 0
10 0 0 0
12 0 0 0
13 1 0 1
15 1 0 2
16 0 0 0
Do groupby on cumsum:
a['z'] = a.groupby(a['x'].eq(0).cumsum())['x'].cumsum()
Output:
x y z
0 0 0 0
1 1 0 1
2 1 0 2
3 1 0 3
4 0 0 0
6 0 0 0
7 0 0 0
9 0 0 0
10 0 0 0
12 0 0 0
13 1 0 1
15 1 0 2
16 0 0 0