Search code examples
pythonpandasdatefilterpandas-groupby

Filter for most recent event by group with pandas


I'm trying to filter a pandas dataframe so that I'm able to get the most recent data point for each account number in the dataframe. Here is an example of what the data looks like. I'm looking for an output of one instance of an account with the product and most recent date.

account_number product   sale_date
0             123  rental  2021-12-01
1             423  rental  2021-10-01
2             513    sale  2021-11-02
3             123    sale  2022-01-01
4             513    sale  2021-11-30

I was trying to use groupby and idxmax() but it doesn't work with dates.

And I did want to change the dtype away from date time.

data_grouped = data.groupby('account_number')['sale_date'].max().idxmax()

Any ideas would be awesome.


Solution

  • It seems the sale_date column has strings. If you convert it to datetime dtype, then you can use groupby + idxmax:

    df['sale_date'] = pd.to_datetime(df['sale_date'])
    out = df.loc[df.groupby('account_number')['sale_date'].idxmax()]
    

    Output:

       account_number product  sale_date
    3             123    sale 2022-01-01
    1             423  rental 2021-10-01
    4             513    sale 2021-11-30