I have a pandas dataframe that I have created as follows:
import pandas as pd
ds1 = {'col1':[1,1,1,1,1,1,1, 2,2,2,2,2,2,2], "col2" : [1,1,0,1,1,1,1,1,1,0,1,1,1,1]}
df1 = pd.DataFrame(data=ds1)
print(df1)
col1 col2
0 1 1
1 1 1
2 1 0
3 1 1
4 1 1
5 1 1
6 1 1
7 2 1
8 2 1
9 2 0
10 2 1
11 2 1
12 2 1
13 2 1
As soon as col2
is equal to 0, I want to remove all of the subsequent records BY the same col1
values. In this case, the resulting dataframe would look like this:
col1 col2
0 1 1
1 1 1
2 1 0
7 2 1
8 2 1
9 2 0
A possible solution:
(df1.groupby(['col1'], group_keys=False)
.apply(lambda g: g[~g['col2'].eq(0).cummax().shift(fill_value=False)]))
Alternatively:
g = df1.groupby(['col1'])
pd.concat([x[1][~x[1]['col2'].eq(0).cummax().shift(fill_value=False)] for x in g])
Output:
col1 col2
0 1 1
1 1 1
2 1 0
7 2 1
8 2 1
9 2 0