Search code examples
pythonpandasnumpysumgroup

Pandas - how to aggregate values between 2 ranges in a specific column


I'm working on a df with 2 columns e.g.

column1 = [False, False, False, True, False, False, True]
column2 = [1, 1, 1, 1, 1, 1, 1]

I want to sum all "False" values until the first "True" value, and again sum the following "False" values until the next "True" etc.

The output should be

column3 = [0,0,0,3,0,0,2]

I tried to sum column values but I can't "reset" the counter once hitting a "True" from a different column


Solution

  • You can use:

    df['column3'] = (df['column2']
     .mask(df['column1']) # get False values only
     .groupby(df.loc[::-1, 'column1'].cumsum()) # group with next True
     # get sum of False values only where True
     .transform('sum').where(df['column1'], 0).convert_dtypes()
    )
    

    Output:

       column1  column2  column3
    0    False        1        0
    1    False        1        0
    2    False        1        0
    3     True        1        3
    4    False        1        0
    5    False        1        0
    6     True        1        2