Search code examples
pythonpython-3.xpandasindexingmulti-index

looping through a dataframe to return rows based on 2 indexes in python


I have a multi indexed dataframe sorted by name and date and time and i would like to go through the dataframe to select the latest(time wise) row for each day. see example below:

 name          time            code
Trudy   1/2/2018 2:36PM       126-45
        1/2/2018 3:58PM       127-45
        1/3/2018 5:25PM       122-56
        1/4/2018 6:57PM       122-57
        1/4/2018 9:25PM       122-89
Monk    1/2/2018 3:58PM       147-56
        1/3/2018 4:20PM       258-69
        1/3/2018 5:25PM       259-87
        1/6/2018 6:57PM       254-69
Natalie 1/2/2018 5:21PM       148-35
        1/2/2018 3:54PM       458-69
        1/4/2018 2:26PM       249-47

and i want to get the following result :

 name          time            code
Trudy   1/2/2018 3:58PM       127-45
        1/3/2018 5:25PM       122-56
        1/4/2018 9:25PM       122-89
Monk    1/2/2018 3:58PM       147-56
        1/3/2018 5:25PM       259-87
        1/6/2018 6:57PM       254-69
Natalie 1/2/2018 5:21PM       148-35
        1/4/2018 2:26PM       249-47

Solution

  • If this is your data (once you reset_index()):

    df = pd.DataFrame({'code': ['126-45', '127-45', '122-56', '122-57', '122-89', '147-56', '258-69', '259-87', '254-69', '148-35', '458-69', '249-47'],
                       'name': ['Trudy', 'Trudy', 'Trudy', 'Trudy', 'Trudy', 'Monk', 'Monk', 'Monk', 'Monk', 'Natalie', 'Natalie', 'Natalie'],
                       'time': ['1/2/2018 2:36PM', '1/2/2018 3:58PM', '1/3/2018 5:25PM', '1/4/2018 6:57PM', '1/4/2018 9:25PM', '1/2/2018 3:58PM', '1/3/2018 4:20PM', '1/3/2018 5:25PM', '1/6/2018 6:57PM', '1/2/2018 5:21PM', '1/2/2018 3:54PM', '1/4/2018 2:26PM']})
    
    df['time'] = pd.to_datetime(df['time'], format='%d/%m/%Y %I:%M%p')
    
    #      code     name                time
    #0   126-45    Trudy 2018-02-01 14:36:00
    #1   127-45    Trudy 2018-02-01 15:58:00
    #2   122-56    Trudy 2018-03-01 17:25:00
    #3   122-57    Trudy 2018-04-01 18:57:00
    #4   122-89    Trudy 2018-04-01 21:25:00
    #5   147-56     Monk 2018-02-01 15:58:00
    #6   258-69     Monk 2018-03-01 16:20:00
    #7   259-87     Monk 2018-03-01 17:25:00
    #8   254-69     Monk 2018-06-01 18:57:00
    #9   148-35  Natalie 2018-02-01 17:21:00
    #10  458-69  Natalie 2018-02-01 15:54:00
    #11  249-47  Natalie 2018-04-01 14:26:00
    

    And if you don't matter about time part, use the following:

    df.groupby(['name', pd.Grouper(key='time', freq='D')])['time', 'code'].max()
    
    #                      code
    #name    time              
    #Monk    2018-02-01  147-56
    #        2018-03-01  259-87
    #        2018-06-01  254-69
    #Natalie 2018-02-01  458-69
    #        2018-04-01  249-47
    #Trudy   2018-02-01  127-45
    #        2018-03-01  122-56
    #        2018-04-01  122-89