Search code examples
pythonfinancecountingcumulative-sumquantitative-finance

Count consecutive row values but reset count with every 0 in row


Within a dataframe, I need to count and sum consecutive row values in column A into a new column, column B.

Starting with column A, the script would count the consecutive runs in 1s but when a 0 appears it prints the total count in column B, it then resets the count and continues through the remaining data.

Desired outcome:

A | B
0   0
1   0
1   0
1   0
1   0
0   4
0   0
1   0
1   0
0   2

I've tried using .shift() along with various if-statements but have been unsuccessful.


Solution

  • This could be a way to do it. Probably there exists a more elegant solution.

    df['B'] = df['A'].groupby(df['A'].ne(df['A'].shift()).cumsum()).cumsum().shift(fill_value=0) * (df['A'].diff() == -1)
    

    This part df['A'].groupby(df['A'].ne(df['A'].shift()) groups the data by consecutive occurences of values. Then we take the cumsum which counts the cumulated sum along each group. Then we shift the results by 1 row because you want the count after the group. Then we mask out all the rows which are not the last row of the group + 1.