Search code examples
pythonpandasdatetimeanalytics

Keep the latest of rows for every date-time value present using pandas


I have a dataset which looks like this,

       DateTime             Value
2019-06-05 12:10:31.467 
2019-06-05 12:27:30.383    13371.6  
2019-06-05 12:49:31.140    14020.4  
2019-06-05 14:58:16.503    14045.5  
2019-06-06 02:58:16.353    14095.8  
2019-06-06 13:21:49.570    14020.4  
2019-06-06 14:51:58.610    14095.8  
2019-06-06 14:58:13.093    13066.3  
2019-06-07 02:58:10.073    13273.4

I want the pandas code to get me only the latest value of date-time rows,

       DateTime             Value   
2019-06-05 14:58:16.503    14045.5  
2019-06-06 14:58:13.093    13066.3  
2019-06-07 02:58:10.073    13273.4

Solution

  • Run:

    df.groupby(df.DateTime.dt.date).last().reset_index(drop=True)
    

    The grouping is on the date part of DateTime and the result from each group is the last row.

    The final part (reset_index) is added to delete the (repeated) date part - the grouping key.

    Grouping by df.DateTime.dt.day is wrong. It would be obvious if source data contained dates from different months. In such case all rows from the first day of each month would be members of the first group, and so on.