Search code examples
pythonpandasdataframedelete-row

Delete previous rows in Pandas Dataframe based on condition


I have a dataframe with user_id and some informations about them

User_id   type     info
31       R*1005    no
31       R*10335   no
25       R*1005    no
25       R*243     no
25       R*4918    yes
25       R*9017    no
25       R*9015    no
46       R*9470    no

I want to drop previous rows from user_id when the column info is "yes". In the case above will be like:

User_id   type     info
31       R*1005    no
31       R*10335   no
25       R*9017    no
25       R*9015    no
46       R*9470    no

How to do this in a smart way?


Solution

  • Idea is test if at least one yes in group and then for this group remove previous yes rows:

    m = df['info'].eq('yes')
    g = m.groupby(df['User_id'])
    
    m1 = g.transform('any')
    m2 = g.cumsum().ne(0)
    
    df = df[(~m1 | m2) & ~m]
    print (df)
       User_id     type info
    0       31   R*1005   no
    1       31  R*10335   no
    5       25   R*9017   no
    6       25   R*9015   no
    7       46   R*9470   no