I'm trying to filter a pandas dataframe so that I'm able to get the most recent data point for each account number in the dataframe. Here is an example of what the data looks like. I'm looking for an output of one instance of an account with the product and most recent date.
account_number product sale_date
0 123 rental 2021-12-01
1 423 rental 2021-10-01
2 513 sale 2021-11-02
3 123 sale 2022-01-01
4 513 sale 2021-11-30
I was trying to use groupby
and idxmax()
but it doesn't work with dates.
And I did want to change the dtype away from date time.
data_grouped = data.groupby('account_number')['sale_date'].max().idxmax()
Any ideas would be awesome.
It seems the sale_date
column has strings. If you convert it to datetime dtype, then you can use groupby
+ idxmax
:
df['sale_date'] = pd.to_datetime(df['sale_date'])
out = df.loc[df.groupby('account_number')['sale_date'].idxmax()]
Output:
account_number product sale_date
3 123 sale 2022-01-01
1 423 rental 2021-10-01
4 513 sale 2021-11-30