I have a very big data frame with the orders for some products with a reference. This reference has periodical updates, so for the same product there are a lot of rows in the dataframe. I want to choose the last update for each reference, but i dont know why.
For a reference, for example there are 10 updates, for another, 34, so there is not a patron...
Any ideas?
I assume it should be something like -
df.sort_values("update_date",ascending=False).groupby("reference").first()
You first sort the data frame by the update_date is descending order and then group it by reference
and for each reference choose the first record