Search code examples
python-3.xpandasdataframedata-analysisdata-processing

Filter dataframe on multiple conditions within different columns


I have a sample of the dataframe as given below.

data = {'ID':['A', 'A', 'A', 'A', 'A', 'B','B','B','B'],
    'Date':['2021-2-13', '2021-2-14', '2021-2-14', '2021-2-14', '2021-2-15', '2021-2-14', '2021-2-14', '2021-2-15', '2021-2-15'],
    'Modified_Date':['3/19/2021  6:34:20 PM','3/20/2021  4:57:39 PM', '3/21/2021  4:57:40 PM', '3/22/2021  4:57:57 PM', '3/23/2021  4:57:41 PM',
                    '3/25/2021  11:44:15 PM','3/26/2021  2:16:09 PM', '3/20/2021  2:16:04 PM', '3/21/2021  4:57:40 PM'],
    'Steps': [1000, 1200, 1500, 2000, 1400, 4000, 5000,1000, 3500]}
df1 = pd.DataFrame(data)

df1

enter image description here

This data have to be filtered in such a way that first for 'ID', and then for each 'Date', the latest entry of 'Modified_Date' row has to be selected.
EX: For ID=A, For Date='2021-04-14', The latest/last modified date = '3/22/2021 4:57:57 PM', This row has to be selected.

I have attached the snippet of the how the final dataframe has to look like.

enter image description here

I have been stuck on this for a while.


Solution

  • The easiest/most straighforward is to sort by date and take the last per group:

    (df1.sort_values(by='Modified_Date')
        .groupby(['ID', 'Date'], as_index=False).last()
    )
    

    output:

      ID       Date          Modified_Date  Steps
    0  A  2021-2-13  3/19/2021  6:34:20 PM   1000
    1  A  2021-2-14  3/22/2021  4:57:57 PM   2000
    2  A  2021-2-15  3/23/2021  4:57:41 PM   1400
    3  B  2021-2-14  3/26/2021  2:16:09 PM   5000
    4  B  2021-2-15  3/21/2021  4:57:40 PM   3500