Search code examples
pythonpandasseries

Select only the nth largest value in a Series, for each day


I have some noise survey data telling me noise levels measured over the period of several days. I want to find the 5th highest noise level in each night-time period. I have made this into a Pandas Series and used groupby and nlargest methods to show me the 5 highest noise levels each night, but now I want to view only the 5th highest value for each period (i.e. 82, 86, 86, 87 etc.). What's the best way to achieve this?

night_time_lmax.groupby(by=night_time_lmax.index.date).nlargest(5)
            Start date & time  
2021-08-18  2021-08-18 23:00:00     82.0
            2021-08-18 23:15:00     82.0
            2021-08-18 23:30:00     82.0
            2021-08-18 23:45:00     82.0
2021-08-19  2021-08-19 05:45:00    100.0
            2021-08-19 01:15:00     91.0
            2021-08-19 04:45:00     87.0
            2021-08-19 06:15:00     87.0
            2021-08-19 01:45:00     86.0
2021-08-20  2021-08-20 06:30:00     90.0
            2021-08-20 06:00:00     88.0
            2021-08-20 03:15:00     87.0
            2021-08-20 05:30:00     87.0
            2021-08-20 01:15:00     86.0
2021-08-21  2021-08-21 01:30:00     98.0
            2021-08-21 03:00:00     93.0
            2021-08-21 00:45:00     88.0
            2021-08-21 06:00:00     88.0
            2021-08-21 03:30:00     87.0
2021-08-22  2021-08-22 23:45:00    102.0
            2021-08-22 00:30:00     96.0
            2021-08-22 06:30:00     92.0
            2021-08-22 05:00:00     91.0
            2021-08-22 01:30:00     90.0
2021-08-23  2021-08-23 01:15:00     98.0
            2021-08-23 02:15:00     88.0
            2021-08-23 00:45:00     87.0
            2021-08-23 03:00:00     86.0
            2021-08-23 06:00:00     86.0
2021-08-24  2021-08-24 01:00:00     93.0
            2021-08-24 00:30:00     89.0
            2021-08-24 06:30:00     87.0
            2021-08-24 02:45:00     86.0
            2021-08-24 06:00:00     86.0```

Solution

  • I see two options here.

    Either sort your data by your value and then take the nth element per group:

    (night_time_lmax.sort_values(by='value_column', ascending=False)
                    .groupby(by=night_time_lmax.index.date).nth(5)
    )
    
    ## below gives the same result for shorter syntax:
    # (night_time_lmax.sort_values(by='value_column')
    #                 .groupby(by=night_time_lmax.index.date).nth(-5)
    # )
    

    Or use a double groupby, once for the top 5 and once for the last:

    (night_time_lmax.groupby(by=night_time_lmax.index.date).nlargest(5)
                    .groupby(by=night_time_lmax.index.date).last()
    )