Search code examples
pythonpandascumsum

Conditional cumsum and reset to 0


I read this post but not exactly applicable to my case pandas-resetting cumsum at a specific number

My expected output on column "cumulation":

comment         count   pre_cnt_diff    cumulation
auto 1          10      0               0
auto 2          30      20              20
auto 3          70      40              60
auto 4          120     50              110
auto 5          120     0               110
auto 6          130     10              120
auto 7          150     20              140
manual_input_1  150     0               0
auto 8          200     50              50
auto 9          230     30              80
manual_input_2  230     0               0

My current output on column "cumulation":

comment         count   pre_cnt_diff    cumulation
auto 1          10      0               0
auto 2          30      20              20
auto 3          70      40              60
auto 4          120     50              110
auto 5          120     0               0
auto 6          130     10              10
auto 7          150     20              30
manual_input_1  150     0               0
auto 8          200     50              50
auto 9          230     30              80
manual_input_2  230     0               0

This is my code:

import pandas as pd

d = {'comment': ['auto 1', 'auto 2', 'auto 3', 'auto 4', 'auto 5', 'auto 6', 
'auto 7', 'manual input 1', 'auto 8', 'auto 9', 'manual input 2'],
'count': [10,30,70,120,120,130,150,150,200,230,230]}
df = pd.DataFrame(data=d)

df['pre_cnt_diff'] = df['count'].diff().fillna(0)
g = df.pre_cnt_diff.eq(0).cumsum().fillna(0)
df['cumulation'] = df.groupby(g).pre_cnt_diff.cumsum()

I want the column "cumulation" to reset to 0 when comment has a special pattern of string "manual_input" and continue to cumsum from last 0 value. I think this must to use regex to search my comment column and add if-else statement. Much appreciated if anyone can assist to modify my code.


Solution

  • Create custom groups:

    grps = df['comment'].str.contains(r'^manual input').cumsum()
    
    df['cumulation'] = df.groupby(grps)['pre_cont_diff'].cumsum()
    

    Output:

    >>> df
               comment  count  pre_cont_diff  cumulation
    0           auto 1     10            0.0         0.0
    1           auto 2     30           20.0        20.0
    2           auto 3     70           40.0        60.0
    3           auto 4    120           50.0       110.0
    4           auto 5    120            0.0       110.0
    5           auto 6    130           10.0       120.0
    6           auto 7    150           20.0       140.0
    7   manual input 1    150            0.0         0.0
    8           auto 8    200           50.0        50.0
    9           auto 9    230           30.0        80.0
    10  manual input 2    230            0.0         0.0
    

    Details:

    >>> pd.concat([df['comment'], grps], axis=1)
               comment  comment
    0           auto 1        0
    1           auto 2        0
    2           auto 3        0
    3           auto 4        0
    4           auto 5        0
    5           auto 6        0
    6           auto 7        0
    7   manual input 1        1
    8           auto 8        1
    9           auto 9        1
    10  manual input 2        2