Search code examples
pythonpandasdataframepandas-groupbycumsum

Pandas group by cumsum with a flag condition


Assuming i have the following data frame

date flag user num
0 2019-01-01 1 a 10
1 2019-01-02 0 a 20
2 2019-01-03 1 b 30
3 2019-03-04 1 b 40

I want to create a cumulative sum of the nums grouped by user only if flag == 1 so i will get this:

date flag user num cumsum
0 2019-01-01 1 a 10 10
1 2019-01-02 0 a 20 10
2 2019-01-03 1 b 30 30
3 2019-03-04 1 b 40 70

So far i was able to cumsum by flag, disregarding the group by user

df['cumsum'] = df[df['flag'] == 1 ]['num'].transform(pd.Series.cumsum)

or cumsum by user disregarding the flag

df['cumsum'] = df.groupby('user')['num'].transform(pd.Series.cumsum)

I need help making them work together.


Solution

  • You could multiply num by flag to make num = 0 where flag = 0, group by user, and cumsum:

    df['cumsum'] = df['num'].mul(df['flag']).groupby(df['user']).cumsum()
    

    Output:

    >>> df
             date  flag user  num  cumsum
    0  2019-01-01     1    a   10      10
    1  2019-01-02     0    a   20      10
    2  2019-01-03     1    b   30      30
    3  2019-03-04     1    b   40      70