Search code examples
pythonpandasformattingrowdataframe

How to keep the most recent values and delete all other others in a column (pandas)


I'm trying to get the latest occurrence of an ID Name and I want to delete every other occurrence that happened before that time.

    ID Name     Comment        Time
0     W12D0       Fine     12:17:37
1     W12D0     Random     12:20:10
2     W12D0       What     12:21:06
3     W12D4       Fine     08:20:14
4     W12D5     Random     10:11:12
5     W12D5       Fine     11:37:02
..      ...        ...         ....

For example (according to the data above), the 'ID Name', 'W12D0', is associated with 3 occurrences: 12:17:37 , 12:20:10 , 12:21:06

I want to only keep the row associated to that ID Name's latest time (in this case, it's 12:21:06). Every other row with W12D0 will be deleted. Essentially, I want something like this:

    ID Name     Comment        Time
0     W12D0       What     12:21:06
1     W12D4       Fine     08:20:14
2     W12D5       Fine     11:37:02
..      ...        ...         ....

How would I go about doing this?


Solution

  • Quick and dirty but should probably do the job!

    import pandas as pd
    import numpy as np
    
    # load your data
    
    data['Time'] = pd.to_datetime(data['Time'])
    
    list_of_indexes = [np.argmin(g['Time']) for l, g in data.groupby('ID Name')]
    data.ix[list_of_indexes]