I have a dataset like this :
I want to make a two columns of cumulative sum of reference column. But I want to make it reset with two separate condition.
For example, when I meet conditon 1, I want to reset just one column considering condition 1, but not condition 2. In that case, the other column considering only condion 2 should continue without reset.
reference_cumsum=[1,2,3,None,2,4,6,None,1,2,3]
condition=[None,None,None,'condtion1',None,None,None,'conditon2',None,None,None]
pd.DataFrame({'ref_cumsum':reference_cumsum,'event_condition':condition})
I have try a previous solution from this link https://stackoverflow.com/questions/45964740/python-pandas-cumsum-with-reset-everytime-there-is-a-0
But I don't know how to do it with multiple condition.
I have to obtain a dataframe like this :
You can use a loop with a changing groupby.cumsum
on all unique
conditions:
for cond in df['event_condition'].dropna().unique():
df[f'cumsum_{cond}'] = df.groupby(df['event_condition'].eq(cond).cumsum())['ref_cumsum'].cumsum()
Note that there are two cumsum
in the code. The first one is to set up the grouper, the second one is the actual cumulated sum of the values.
Output:
ref_cumsum event_condition cumsum_condtion1 cumsum_conditon2
0 1.0 None 1.0 1.0
1 2.0 None 3.0 3.0
2 3.0 None 6.0 6.0
3 NaN condtion1 NaN NaN
4 2.0 None 2.0 8.0
5 4.0 None 6.0 12.0
6 6.0 None 12.0 18.0
7 NaN conditon2 NaN NaN
8 1.0 None 13.0 1.0
9 2.0 None 15.0 3.0
10 3.0 None 18.0 6.0