I have DataFrame in Python Pandas like below ("date_col" is in "datetime64" format):
ID | date_col | purchase
----|------------|-------
111 | 2019-01-05 | apple
111 | 2019-05-22 | onion
222 | 2020-11-04 | banana
333 | 2020-04-19 | orange
I need to aggregate above table in the following way:
So as a result I need something like below:
ID | date_col | purchase | col1
----|------------|----------|-----
111 | 2019-05-22 | onion | 2
222 | 2020-11-04 | banana | 1
333 | 2020-04-19 | orange | 1
Assuming the dataframe is sorted on date_col
column, you can use groupby
:
g = df.groupby('ID', as_index=False)
g.last().merge(g.size())
ID date_col purchase size
0 111 2019-05-22 onion 2
1 222 2020-11-04 banana 1
2 333 2020-04-19 orange 1