I have a pandas dataframe with values that needs to be totalized inside a period for each device, but the periods ends are marked in another column (an easy identifiable event line). The cumsum must go until it finds this end of period marker and then start again from zero(the first value on the next line).
device_name value end
0 A5 1 False
1 A5 7 False
2 A5 2 True
3 A5 1 False
4 A5 1 False
5 A5 1 False
6 A5 1 True
7 A6 2 False
8 A6 4 False
9 A6 2 False
10 A6 2 True
11 A6 2 False
12 A6 2 False
the result should be something like
device_name value end total
0 A5 1 False 1
1 A5 7 False 8
2 A5 2 True 10
3 A5 1 False 1
4 A5 1 False 2
5 A5 1 False 3
6 A5 1 True 4
7 A6 2 False 2
8 A6 4 False 6
9 A6 2 False 8
10 A6 2 True 10
11 A6 2 False 2
12 A6 2 False 4
I tried cumsum but I can't separate the end of a period from the other, I can iterate over rows and if i find a True on df.end.shift(1) i ignore the last value, but there are at least 60000 and probably there are faster methods, Any suggestion?
IIUC, you want to cumsum
per group until you reach a True. Then, after this row, restart the count.
You can use an extra group based on the "end" value (also using a cumsum
):
df['total'] = (df.groupby(['device_name',
df['end'].shift(1, fill_value=0).cumsum()])
['value'].cumsum())
output:
device_name value end total
0 A5 1 False 1
1 A5 7 False 8
2 A5 2 True 10
3 A5 1 False 1
4 A5 1 False 2
5 A5 1 False 3
6 A5 1 True 4
7 A6 2 False 2
8 A6 4 False 6
9 A6 2 False 8
10 A6 2 True 10
11 A6 2 False 2
12 A6 2 False 4
NB. note that I get a different value for row #2
NB.2. for purists, the extra group could also be computed using a groupby
. It doesn't really matter in this case. The internal groups will just not start from zero after the first group, but their name will not be used anywhere in the output.