The group_id should be increasing for each binary that is = 1 but grouping all consecutive 0's into one group_id, with the next group of consecutive 0's being in another group_id.
I have been using
df['group_id1'] = df['diff'].cumsum()
and getting the group_id1 column as a result. The result I am looking for is group_id2.
diff group_id1 group_id2
0 1 1 1
1 0 1 2
2 0 1 2
3 0 1 2
4 0 1 2
5 0 1 2
6 0 1 2
7 0 1 2
8 0 1 2
9 1 2 3
10 1 3 4
11 1 4 5
12 1 5 6
13 1 6 7
14 1 7 8
15 1 8 9
16 1 9 10
17 1 10 11
18 0 10 12
19 0 10 12
Assuming you only have 0/1, you can use diff
to identify the 0s preceded by 1s:
df['group_id2'] = (df['diff'].eq(1)|df['diff'].diff().eq(-1)).cumsum()
If you have other values go with shift
:
df['group_id2'] = (df['diff'].eq(1)
|(df['diff'].eq(0)&df['diff'].shift().eq(1))
).cumsum()
Another variant:
s = df['diff'].eq(1)
df['group_id2'] = (s|s.shift()).cumsum()
Output:
diff group_id2
0 1 1
1 0 2
2 0 2
3 0 2
4 0 2
5 0 2
6 0 2
7 0 2
8 0 2
9 1 3
10 1 4
11 1 5
12 1 6
13 1 7
14 1 8
15 1 9
16 1 10
17 1 11
18 0 12
19 0 12