Search code examples
pythonpandasnumpyincrementcumsum

Fill gaps between 1's in Pandas dataframe column with increment values that reset when next 1 is reached


Apparently this is a more complicated problem than I thought. All I want to do is fill the zeros with ++1 increments until the next 1

My dataset is 1m+ rows, so I'm trying to vectorize this operation if possible.

Here's a sample column:

# Define the input dataframe
df = pd.DataFrame({'col': [1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0]})

0     1
1     0
2     1
3     0
4     1
5     1
6     0
7     0
8     0
9     0
10    1
11    0
12    1
13    1
14    0

Goal Result:

0     1
1     2
2     1
3     2
4     1
5     1
6     2
7     3
8     4
9     5
10    1
11    2
12    1
13    1
14    2

I've tried a number of different methods with ffill() and cumsum(), but the issue with cumsum() tends to be that it doesn't reset the increment.


Solution

  • Group by cumulative sums of column col and apply cumcount:

    df['col'] = df.groupby(df['col'].cumsum())['col'].cumcount() + 1
    

        col
    0     1
    1     2
    2     1
    3     2
    4     1
    5     1
    6     2
    7     3
    8     4
    9     5
    10    1
    11    2
    12    1
    13    1
    14    2