I was looking out to extract consecutive rows with specified text repeated continuously for more than 5 times.
ex:
A B C
10 john 1
12 paul 1
23 kishan 1
12 teja 1
12 zebo 1
324 vauh -1
3434 krish -1
232 poo -1
4535 zoo 1
4343 doo 1
342 foo -1
123 soo 1
121 koo -1
34 loo -1
343454 moo -1
565343 noo -1
2323234 voo -1
3434 coo 1
545 xoo 1
6565 zoo 1
232321 qoo 1
34454 woo 1
546556 eoo 1
65665 roo -1
5343 too -1
3232 yoo 1
1212 uoo 1
23355667 ioo 1
787878 joo -1
I am looking out for the below result where the column value 'c' has consecutive 1's repeated more than 4 times as different groups .
Output:
A B C group
10 john 1 1
12 paul 1 1
23 kishan 1 1
12 teja 1 1
12 zebo 1 1
3434 coo 1 2
545 xoo 1 2
6565 zoo 1 2
232321 qoo 1 2
34454 woo 1 2
546556 eoo 1 2
Using masks and factorize
:
# identify 1s
m = df['C'].eq(1)
# group consecutive values
g = m.ne(m.shift()).cumsum()
# identify stretches of 5+ 1s
m2 = m & df.groupby(g)['C'].transform('size').ge(5)
out = (df.loc[m2]
.assign(group=pd.factorize(g[m2])[0]+1)
)
Output:
A B C group
0 10 john 1 1
1 12 paul 1 1
2 23 kishan 1 1
3 12 teja 1 1
4 12 zebo 1 1
17 3434 coo 1 2
18 545 xoo 1 2
19 6565 zoo 1 2
20 232321 qoo 1 2
21 34454 woo 1 2
22 546556 eoo 1 2