I want to drop the first group of rows based on a column's value. Here is an example of a table
stage | h1 | h2 | h3 |
---|---|---|---|
0 | 4 | 55 | 55 |
0 | 5 | 66 | 44 |
0 | 4 | 66 | 33 |
1 | 3 | 33 | 55 |
0 | 5 | 44 | 33 |
Get the column stage, get all the first group of rows that start with 0, and drop the rows in the table. The table will look like this:
stage | h1 | h2 | h3 |
---|---|---|---|
1 | 3 | 33 | 55 |
0 | 5 | 44 | 33 |
This is what I did:
import pandas as pd
data = {'stage': [0, 0, 0, 1, 0],
'h1': [4, 5, 4, 3, 5],
'h2': [55, 66, 66, 33, 44],
'h3': [55, 44, 33, 55, 33]}
df = pd.DataFrame(data)
# Find indices of the first group of rows with uiwp_washing_stage = 0
indices_to_drop = []
for i in range(len(df)):
if df['stage'].iloc[i] == 0:
indices_to_drop.append(i)
else:
break
df = df.drop(indices_to_drop)
df = df.reset_index(drop=True)
print(df)
The above seems to work, but if the file is too big it takes a while, is there a Pands way of doing this?
df.iloc[df['stage'].diff().idxmax():]
First, find the first transition from 0 to 1 is by computing the difference between consecutive values in the stage
column (using diff
). Then use idxmax
to locate the index where the first transition occurs.
NB: In case, there are transitions that differ more than 1 unit, then use: df.iloc[df['stage'].diff().gt(0).idxmax():]
Output:
stage h1 h2 h3
3 1 3 33 55
4 0 5 44 33