Search code examples
pythonpandasgroup-bybinarycumsum

In python I want to create a 'group_id' column based on a binary column with the count increasing for values that are 1 and staying constant for 0


The group_id should be increasing for each binary that is = 1 but grouping all consecutive 0's into one group_id, with the next group of consecutive 0's being in another group_id.

I have been using

df['group_id1'] = df['diff'].cumsum()

and getting the group_id1 column as a result. The result I am looking for is group_id2.

    diff    group_id1   group_id2
0   1   1   1
1   0   1   2
2   0   1   2
3   0   1   2
4   0   1   2
5   0   1   2
6   0   1   2
7   0   1   2
8   0   1   2
9   1   2   3
10  1   3   4
11  1   4   5
12  1   5   6
13  1   6   7
14  1   7   8
15  1   8   9
16  1   9   10
17  1   10  11
18  0   10  12
19  0   10  12

Solution

  • Assuming you only have 0/1, you can use diff to identify the 0s preceded by 1s:

    df['group_id2'] = (df['diff'].eq(1)|df['diff'].diff().eq(-1)).cumsum()
    

    If you have other values go with shift:

    df['group_id2'] = (df['diff'].eq(1)
                      |(df['diff'].eq(0)&df['diff'].shift().eq(1))
                      ).cumsum()
    

    Another variant:

    s = df['diff'].eq(1)
    df['group_id2'] = (s|s.shift()).cumsum()
    

    Output:

        diff  group_id2
    0      1          1
    1      0          2
    2      0          2
    3      0          2
    4      0          2
    5      0          2
    6      0          2
    7      0          2
    8      0          2
    9      1          3
    10     1          4
    11     1          5
    12     1          6
    13     1          7
    14     1          8
    15     1          9
    16     1         10
    17     1         11
    18     0         12
    19     0         12