Search code examples
pythonlambda

Filter Rows If The First Sign Change Is Detected In A Column By Group In Python


I have a dataframe:

col1 col2 group
10   1     A
11   2     A
12  -2     A
13  -3     B
14   3     B

How do I choose rows if there is a first sign change in col2 by group.

For instance, for the first unique value of group ('A'), the first sign change (+ to -) occurs at the third row (col1=12 and col2 = -2).

Desired result:

col1 col2 group
12  -2    A
14   3    B
data = {
    'col1': [10, 11, 12, 13, 14], 
    'col2': [1, 2, -2, -3, 3],
    'group': ['A', 'A', 'A', 'B', 'B']
    }  
df = pd.DataFrame(data)  

Solution

  • import pandas as pd
    
    ind = []
    
    
    def my_func(x):
        if x.loc[x.index[0], 'col2'] < 0:
            ind.append(x[x['col2'] >= 0].index[0])
        else:
            ind.append(x[x['col2'] < 0].index[0])
    
    
    df.groupby('group').apply(my_func)
    
    df1 = df.loc[ind]
    
    print(df1)
    

    Output

       col1  col2 group
    2    12    -2     A
    4    14     3     B
    

    The dataframe is grouped by the 'group' column. In the my_func function, it is checked if the first number is negative, then we are looking for the first non-negative one, we get its index. In else: opposite condition. Indexes are written to the ind list, which is then substituted into the dataframe to select the desired rows. Explicit loc indexing is used.

    Below is a variant with lambda:

    bbb = df.groupby('group').apply(
        lambda x: x[x['col2'] >= 0].index[0] if x.loc[x.index[0], 'col2'] < 0 else x[x['col2'] < 0].index[0])
    
    df1 = df.loc[bbb.values]
    
    print(df1)