I have some noise survey data telling me noise levels measured over the period of several days. I want to find the 5th highest noise level in each night-time period. I have made this into a Pandas Series and used groupby and nlargest methods to show me the 5 highest noise levels each night, but now I want to view only the 5th highest value for each period (i.e. 82, 86, 86, 87 etc.). What's the best way to achieve this?
night_time_lmax.groupby(by=night_time_lmax.index.date).nlargest(5)
Start date & time
2021-08-18 2021-08-18 23:00:00 82.0
2021-08-18 23:15:00 82.0
2021-08-18 23:30:00 82.0
2021-08-18 23:45:00 82.0
2021-08-19 2021-08-19 05:45:00 100.0
2021-08-19 01:15:00 91.0
2021-08-19 04:45:00 87.0
2021-08-19 06:15:00 87.0
2021-08-19 01:45:00 86.0
2021-08-20 2021-08-20 06:30:00 90.0
2021-08-20 06:00:00 88.0
2021-08-20 03:15:00 87.0
2021-08-20 05:30:00 87.0
2021-08-20 01:15:00 86.0
2021-08-21 2021-08-21 01:30:00 98.0
2021-08-21 03:00:00 93.0
2021-08-21 00:45:00 88.0
2021-08-21 06:00:00 88.0
2021-08-21 03:30:00 87.0
2021-08-22 2021-08-22 23:45:00 102.0
2021-08-22 00:30:00 96.0
2021-08-22 06:30:00 92.0
2021-08-22 05:00:00 91.0
2021-08-22 01:30:00 90.0
2021-08-23 2021-08-23 01:15:00 98.0
2021-08-23 02:15:00 88.0
2021-08-23 00:45:00 87.0
2021-08-23 03:00:00 86.0
2021-08-23 06:00:00 86.0
2021-08-24 2021-08-24 01:00:00 93.0
2021-08-24 00:30:00 89.0
2021-08-24 06:30:00 87.0
2021-08-24 02:45:00 86.0
2021-08-24 06:00:00 86.0```
I see two options here.
Either sort your data by your value and then take the nth
element per group:
(night_time_lmax.sort_values(by='value_column', ascending=False)
.groupby(by=night_time_lmax.index.date).nth(5)
)
## below gives the same result for shorter syntax:
# (night_time_lmax.sort_values(by='value_column')
# .groupby(by=night_time_lmax.index.date).nth(-5)
# )
Or use a double groupby
, once for the top 5 and once for the last:
(night_time_lmax.groupby(by=night_time_lmax.index.date).nlargest(5)
.groupby(by=night_time_lmax.index.date).last()
)