I need to find and rank the highest to lowest 5-day averages of hourly data using multiple panda columns that characterize an "event". My data in df dataframe looks like this below. The outcome will show the ranked consecutive day "event" of 5-day consecutive day average of positive wind, minimum temp, and highest rh. I am not sure how to calculate consecutive 5-day averages and then how to rank by multiple conditions. Thank you!
site year month day wind temp rh
0 A 1991 1 1 5.3 2.1 80.4
1 A 1991 1 2 12.6 -1.4 85.0
2 A 1991 1 3 14.7 -2.6 95.1
3 A 1991 1 4 11.8 4.8 57.3
4 A 1991 1 5 5.2 2.9 45.9
5 A 1991 1 6 3.9 4.3 52.1
6 A 1991 1 7 2.6 5.8 34.7
7 A 1991 1 8 2.9 5.7 29.2
8 A 1991 1 9 10.4 1.4 69.4
9 A 1991 1 10 14.6 -0.9 72.1
10 A 1991 1 11 13.9 -1.6 84.6
11 A 1991 1 12 14.5 -5.1 87.2
12 A 1991 1 13 12.8 -6.7 80.9
13 A 1991 1 14 8.4 -4.3 54.3
14 A 1991 1 15 5.7 0.7 44.8
I have tried using different options of the rolling mean like this below but get "list assignment index out of range" error:
df['rolling_wind','rolling_t','rolling_rh'] = df.groupby(['wind','temp','rh']).rolling(window=5).mean()
The 5-day rolling average should look like this:
site year month day wind temp rh
0 A 1991 1 1 n/a n/a n/a
1 A 1991 1 2 n/a n/a n/a
2 A 1991 1 3 n/a n/a n/a
3 A 1991 1 4 n/a n/a n/a
4 A 1991 1 5 9.92 1.16 72.74
5 A 1991 1 6 9.64 1.6 67.08
6 A 1991 1 7 7.64 3.04 57.02
7 A 1991 1 8 5.28 4.7 43.84
8 A 1991 1 9 5 4.02 46.26
9 A 1991 1 10 6.88 3.26 51.5
10 A 1991 1 11 8.88 2.08 58
11 A 1991 1 12 11.26 -0.1 68.5
12 A 1991 1 13 13.24 -2.58 78.84
13 A 1991 1 14 12.84 -3.72 75.82
14 A 1991 1 15 11.06 -3.4 70.36
And, the final output should look like this with ranking priority for wind, temp, rh in that order:
site year month day wind temp rh
0 A 1991 1 1 n/a n/a n/a
1 A 1991 1 2 n/a n/a n/a
2 A 1991 1 3 n/a n/a n/a
3 A 1991 1 4 n/a n/a n/a
12 A 1991 1 13 13.24 -2.58 78.84
13 A 1991 1 14 12.84 -3.72 75.82
11 A 1991 1 12 11.26 -0.1 68.5
14 A 1991 1 15 11.06 -3.4 70.36
4 A 1991 1 5 9.92 1.16 72.74
5 A 1991 1 6 9.64 1.6 67.08
10 A 1991 1 11 8.88 2.08 58
6 A 1991 1 7 7.64 3.04 57.02
9 A 1991 1 10 6.88 3.26 51.5
7 A 1991 1 8 5.28 4.7 43.84
8 A 1991 1 9 5 4.02 46.26
Try rolling mean + sort_values with na_position being first:
import pandas as pd
d = {'site': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'A', 5: 'A', 6: 'A', 7: 'A',
8: 'A', 9: 'A', 10: 'A', 11: 'A', 12: 'A', 13: 'A', 14: 'A'},
'year': {0: 1991, 1: 1991, 2: 1991, 3: 1991, 4: 1991, 5: 1991, 6: 1991,
7: 1991, 8: 1991, 9: 1991, 10: 1991, 11: 1991, 12: 1991, 13: 1991,
14: 1991},
'month': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1,
10: 1, 11: 1, 12: 1, 13: 1, 14: 1},
'day': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10,
10: 11, 11: 12, 12: 13, 13: 14, 14: 15},
'wind': {0: 5.3, 1: 12.6, 2: 14.7, 3: 11.8, 4: 5.2, 5: 3.9, 6: 2.6, 7: 2.9,
8: 10.4, 9: 14.6, 10: 13.9, 11: 14.5, 12: 12.8, 13: 8.4, 14: 5.7},
'temp': {0: 2.1, 1: -1.4, 2: -2.6, 3: 4.8, 4: 2.9, 5: 4.3, 6: 5.8, 7: 5.7,
8: 1.4, 9: -0.9, 10: -1.6, 11: -5.1, 12: -6.7, 13: -4.3, 14: 0.7},
'rh': {0: 80.4, 1: 85.0, 2: 95.1, 3: 57.3, 4: 45.9, 5: 52.1, 6: 34.7,
7: 29.2, 8: 69.4, 9: 72.1, 10: 84.6, 11: 87.2, 12: 80.9, 13: 54.3,
14: 44.8}}
df = pd.DataFrame(data=d)
cols = ['wind', 'temp', 'rh']
df[cols] = df[cols].rolling(window=5).mean()
df = df.sort_values(cols, ascending=False, na_position='first')
print(df)
df
:
site year month day wind temp rh
0 A 1991 1 1 NaN NaN NaN
1 A 1991 1 2 NaN NaN NaN
2 A 1991 1 3 NaN NaN NaN
3 A 1991 1 4 NaN NaN NaN
12 A 1991 1 13 13.24 -2.58 78.84
13 A 1991 1 14 12.84 -3.72 75.82
11 A 1991 1 12 11.26 -0.10 68.50
14 A 1991 1 15 11.06 -3.40 70.36
4 A 1991 1 5 9.92 1.16 72.74
5 A 1991 1 6 9.64 1.60 67.08
10 A 1991 1 11 8.88 2.08 58.00
6 A 1991 1 7 7.64 3.04 57.02
9 A 1991 1 10 6.88 3.26 51.50
7 A 1991 1 8 5.28 4.70 43.84
8 A 1991 1 9 5.00 4.02 46.26