I have a pandas dataframe that I have created as follows:
import pandas as pd
ds1 = {'col1':[1,2,3,4,5,6,7], "col2" : [1,1,0,1,1,1,1]}
df1 = pd.DataFrame(data=ds1)
The dataframe looks like this:
print(df1)
col1 col2
0 1 1
1 2 1
2 3 0
3 4 1
4 5 1
5 6 1
6 7 1
As soon as col2
is equal to 0, I want to remove all of the subsequent records, regardless of their values. In this case, the resulting dataframe would look like this:
col1 col2
0 1 1
1 2 1
2 3 0
Another example.
import pandas as pd
import numpy as np
ds1 = {'col1':[1,2,3,4,5,6,7], "col2" : [0,0,0,1,1,1,1]}
df1 = pd.DataFrame(data=ds1)
In this case the resulting dataframe would look like this:
col1 col2
0 1 0
Does anyone know how to do it in python?
Also, additional question.
import pandas as pd
ds1 = {'col1':[1,1,1,1,1,1,1, 2,2,2,2,2,2,2], "col2" : [1,1,0,1,1,1,1,1,1,0,1,1,1,1]}
df1 = pd.DataFrame(data=ds1)
print(df1)
col1 col2
0 1 1
1 1 1
2 1 0
3 1 1
4 1 1
5 1 1
6 1 1
7 2 1
8 2 1
9 2 0
10 2 1
11 2 1
12 2 1
13 2 1
I need to remove the records (same condition as above) BUT by col1. So the resulting dataframe would look like this:
col1 col2
0 1 1
1 1 1
2 1 0
7 2 1
8 2 1
9 2 0
Selecting by integer location:
df = df1.iloc[:df1[df1['col2'].eq(0)].index[0] + 1]
Or alternatively with df.loc
:
df = df1.loc[:df1[df1['col2'].eq(0)].index[0]]
col1 col2
0 1 1
1 2 1
2 3 0
Solution for your 2nd question:
df = (df1.groupby('col1', as_index=False)
.apply(lambda x: x.loc[:x[x['col2'].eq(0)].index[0]])
.reset_index(drop=True))
col1 col2
0 1 1
1 1 1
2 1 0
3 2 1
4 2 1
5 2 0