I'm trying to get the latest occurrence of an ID Name and I want to delete every other occurrence that happened before that time.
ID Name Comment Time
0 W12D0 Fine 12:17:37
1 W12D0 Random 12:20:10
2 W12D0 What 12:21:06
3 W12D4 Fine 08:20:14
4 W12D5 Random 10:11:12
5 W12D5 Fine 11:37:02
.. ... ... ....
For example (according to the data above), the 'ID Name', 'W12D0', is associated with 3 occurrences: 12:17:37 , 12:20:10 , 12:21:06
I want to only keep the row associated to that ID Name's latest time (in this case, it's 12:21:06). Every other row with W12D0 will be deleted. Essentially, I want something like this:
ID Name Comment Time
0 W12D0 What 12:21:06
1 W12D4 Fine 08:20:14
2 W12D5 Fine 11:37:02
.. ... ... ....
How would I go about doing this?
Quick and dirty but should probably do the job!
import pandas as pd
import numpy as np
# load your data
data['Time'] = pd.to_datetime(data['Time'])
list_of_indexes = [np.argmin(g['Time']) for l, g in data.groupby('ID Name')]
data.ix[list_of_indexes]