So I have a df that looks something like this:
Person_ID Procedure_ID Date(d/m/y)
34 30 03/03/2011
34 30 02/03/2011
32 19 01/01/2020
34 32 01/04/2012
If a Person_ID has the same procedure twice or more, like that 34 - 30 case above, the code needs to keep only the newest row, deleting all the others. In the example I gave the expected result would be:
Person_ID Procedure_ID Date(d/m/y)
34 30 02/03/2011
32 19 01/01/2020
34 32 01/04/2012
Thank you in advance!
Groupby 'Person_ID', 'Procedure_ID'
and get the last element from each group.
Code
df.sort_values(by='Date(d/m/y)').groupby(['Person_ID', 'Procedure_ID'], as_index=False).last()
Output
Person_ID Procedure_ID Date(d/m/y)
0 32 19 01/01/2020
1 34 30 02/03/2011
2 34 32 01/04/2012