Search code examples
pythonanalysis

how to select the last value in a irregular data frame


I have a very big data frame with the orders for some products with a reference. This reference has periodical updates, so for the same product there are a lot of rows in the dataframe. I want to choose the last update for each reference, but i dont know why.

For a reference, for example there are 10 updates, for another, 34, so there is not a patron...

Any ideas?


Solution

  • I assume it should be something like -

    df.sort_values("update_date",ascending=False).groupby("reference").first()
    

    You first sort the data frame by the update_date is descending order and then group it by reference and for each reference choose the first record