python python-3.x pandas indexing multi-index

looping through a dataframe to return rows based on 2 indexes in python

I have a multi indexed dataframe sorted by name and date and time and i would like to go through the dataframe to select the latest(time wise) row for each day. see example below:

 name          time            code
Trudy   1/2/2018 2:36PM       126-45
        1/2/2018 3:58PM       127-45
        1/3/2018 5:25PM       122-56
        1/4/2018 6:57PM       122-57
        1/4/2018 9:25PM       122-89
Monk    1/2/2018 3:58PM       147-56
        1/3/2018 4:20PM       258-69
        1/3/2018 5:25PM       259-87
        1/6/2018 6:57PM       254-69
Natalie 1/2/2018 5:21PM       148-35
        1/2/2018 3:54PM       458-69
        1/4/2018 2:26PM       249-47

and i want to get the following result :

 name          time            code
Trudy   1/2/2018 3:58PM       127-45
        1/3/2018 5:25PM       122-56
        1/4/2018 9:25PM       122-89
Monk    1/2/2018 3:58PM       147-56
        1/3/2018 5:25PM       259-87
        1/6/2018 6:57PM       254-69
Natalie 1/2/2018 5:21PM       148-35
        1/4/2018 2:26PM       249-47

Solution

If this is your data (once you reset_index()):

df = pd.DataFrame({'code': ['126-45', '127-45', '122-56', '122-57', '122-89', '147-56', '258-69', '259-87', '254-69', '148-35', '458-69', '249-47'],
                   'name': ['Trudy', 'Trudy', 'Trudy', 'Trudy', 'Trudy', 'Monk', 'Monk', 'Monk', 'Monk', 'Natalie', 'Natalie', 'Natalie'],
                   'time': ['1/2/2018 2:36PM', '1/2/2018 3:58PM', '1/3/2018 5:25PM', '1/4/2018 6:57PM', '1/4/2018 9:25PM', '1/2/2018 3:58PM', '1/3/2018 4:20PM', '1/3/2018 5:25PM', '1/6/2018 6:57PM', '1/2/2018 5:21PM', '1/2/2018 3:54PM', '1/4/2018 2:26PM']})

df['time'] = pd.to_datetime(df['time'], format='%d/%m/%Y %I:%M%p')

#      code     name                time
#0   126-45    Trudy 2018-02-01 14:36:00
#1   127-45    Trudy 2018-02-01 15:58:00
#2   122-56    Trudy 2018-03-01 17:25:00
#3   122-57    Trudy 2018-04-01 18:57:00
#4   122-89    Trudy 2018-04-01 21:25:00
#5   147-56     Monk 2018-02-01 15:58:00
#6   258-69     Monk 2018-03-01 16:20:00
#7   259-87     Monk 2018-03-01 17:25:00
#8   254-69     Monk 2018-06-01 18:57:00
#9   148-35  Natalie 2018-02-01 17:21:00
#10  458-69  Natalie 2018-02-01 15:54:00
#11  249-47  Natalie 2018-04-01 14:26:00

And if you don't matter about time part, use the following:

df.groupby(['name', pd.Grouper(key='time', freq='D')])['time', 'code'].max()

#                      code
#name    time              
#Monk    2018-02-01  147-56
#        2018-03-01  259-87
#        2018-06-01  254-69
#Natalie 2018-02-01  458-69
#        2018-04-01  249-47
#Trudy   2018-02-01  127-45
#        2018-03-01  122-56
#        2018-04-01  122-89