Search code examples
pythonpandas

Pandas dropping first group of values


I want to drop the first group of rows based on a column's value. Here is an example of a table

stage h1 h2 h3
0 4 55 55
0 5 66 44
0 4 66 33
1 3 33 55
0 5 44 33

Get the column stage, get all the first group of rows that start with 0, and drop the rows in the table. The table will look like this:

stage h1 h2 h3
1 3 33 55
0 5 44 33

This is what I did:

import pandas as pd

data = {'stage': [0, 0, 0, 1, 0],
        'h1': [4, 5, 4, 3, 5],
        'h2': [55, 66, 66, 33, 44],
        'h3': [55, 44, 33, 55, 33]}

df = pd.DataFrame(data)

# Find indices of the first group of rows with uiwp_washing_stage = 0
indices_to_drop = []
for i in range(len(df)):
    if df['stage'].iloc[i] == 0:
        indices_to_drop.append(i)
    else:
        break

df = df.drop(indices_to_drop)
df = df.reset_index(drop=True)
print(df)

The above seems to work, but if the file is too big it takes a while, is there a Pands way of doing this?


Solution

  • df.iloc[df['stage'].diff().idxmax():]
    

    First, find the first transition from 0 to 1 is by computing the difference between consecutive values in the stage column (using diff). Then use idxmax to locate the index where the first transition occurs.

    NB: In case, there are transitions that differ more than 1 unit, then use: df.iloc[df['stage'].diff().gt(0).idxmax():]

    Output:

       stage  h1  h2  h3
    3      1   3  33  55
    4      0   5  44  33