Search code examples
pandassortingdrop

Pandas Dataframe keep the most recent incidence


I have below df, i want to keep the most recent incidence for the ID. Incident range from 1 - 3. 1 is most recent and 3 is least.

ID Incident
001 1
001 2
001 3
002 1
002 1
002 1
002 1
002 3
A003 2
A003 2
A003 3

Output

ID Incident
001 1
002 1
002 1
002 1
002 1
A003 2
A003 2

Solution

  • You can first group your dataset based on your ID, and select the minimum for each group using groupby() and min().

    Then you join this result with your original dataframe using merge and with parameter indicator set to True which will show which rows belong to the minimum values.

    Last step will be to using loc to filter your dataframe will get you your answer:

    m = df.groupby('ID',as_index=False).agg({'Incident':'min'})
    out = df.loc[pd.merge(df,m,how='left',indicator=True)._merge.eq('both')]
    

    prints back:

    print(out)
    
         ID  Incident
    0     1         1
    3     2         1
    4     2         1
    5     2         1
    6     2         1
    8  A003         2
    9  A003         2