Search code examples
pythonpandasdataframemax

How to calculate max values in a dataframe column while removing duplicates in another column?


enter image description here I have a dataset containing hourly temperatures for a year. So, I have 24 entries for each day (temp for every hour) and I want to find out the 5 days with highest temp. I am aware of nlargest() function to find out 5 max values but those values happen to be on a single day only. How do I find out the 5 max values but on different days?

I tried using nlargest() and .loc() but could not find the solution. Please help.

I have attached what the dataset looks like.


Solution

  • You might want to get the max per group with groupby.max then find the top 5 with nlargest

    df.groupby(['year','month','day'])['temp'].max().nlargest(5)