How to group values of pandas dataframe and select the latest (by date) from each group?
For example, given a dataframe sorted by date:
id product date
0 220 6647 2014-09-01
1 220 6647 2014-09-03
2 220 6647 2014-10-16
3 826 3380 2014-11-11
4 826 3380 2014-12-09
5 826 3380 2015-05-19
6 901 4555 2014-09-01
7 901 4555 2014-10-05
8 901 4555 2014-11-01
grouping by id or product, and selecting the latest gives:
id product date
2 220 6647 2014-10-16
5 826 3380 2015-05-19
8 901 4555 2014-11-01
use idxmax
in groupby
and slice df
with loc
df.loc[df.groupby('id').date.idxmax()]
id product date
2 220 6647 2014-10-16
5 826 3380 2015-05-19
8 901 4555 2014-11-01