Search code examples
pythonpandaspandas-groupbyaggregation

How do I aggregate rows in a pandas dataframe according to the latest dates in a column?


I have a dataframe containing materials, dates of purchase and purchase prices. I want to filter my dataframe such that I only keep one row containing each material, and that row contains the material at the latest purchase date and corresponding price.

How could I achieve this? I have racked my brains trying to work out how to apply aggregation functions to this but I just can't work out how.


Solution

  • Do a multisort and then use drop duplicates, keeping the first occurrence.

    import pandas as pd
    
    df.sort_values(by=['materials', 'purchase_date'], ascending=[True, False], inplace=True)
    df.drop_duplicates(subset=['materials'], keep='first', inplace=True)