Search code examples
pythonpandasconditional-statementsmoving-average

Calculate & Rank 5-Day Moving Averages in Panda Multiple Column Event Data


I need to find and rank the highest to lowest 5-day averages of hourly data using multiple panda columns that characterize an "event". My data in df dataframe looks like this below. The outcome will show the ranked consecutive day "event" of 5-day consecutive day average of positive wind, minimum temp, and highest rh. I am not sure how to calculate consecutive 5-day averages and then how to rank by multiple conditions. Thank you!

     site     year     month     day     wind    temp    rh 
0    A        1991     1         1       5.3     2.1     80.4
1    A        1991     1         2       12.6    -1.4    85.0
2    A        1991     1         3       14.7    -2.6    95.1   
3    A        1991     1         4       11.8     4.8    57.3
4    A        1991     1         5       5.2     2.9     45.9
5    A        1991     1         6       3.9     4.3     52.1
6    A        1991     1         7       2.6     5.8     34.7
7    A        1991     1         8       2.9     5.7     29.2
8    A        1991     1         9       10.4    1.4     69.4
9    A        1991     1         10      14.6    -0.9    72.1
10   A        1991     1         11      13.9    -1.6    84.6
11   A        1991     1         12      14.5    -5.1    87.2
12   A        1991     1         13      12.8    -6.7    80.9
13   A        1991     1         14      8.4     -4.3    54.3
14   A        1991     1         15      5.7     0.7     44.8

I have tried using different options of the rolling mean like this below but get "list assignment index out of range" error:

df['rolling_wind','rolling_t','rolling_rh'] = df.groupby(['wind','temp','rh']).rolling(window=5).mean()

The 5-day rolling average should look like this:

    site    year    month   day wind    temp    rh
0   A   1991    1           1   n/a      n/a    n/a
1   A   1991    1           2   n/a      n/a    n/a
2   A   1991    1           3   n/a      n/a    n/a
3   A   1991    1           4   n/a      n/a    n/a
4   A   1991    1           5   9.92    1.16    72.74
5   A   1991    1           6   9.64    1.6     67.08
6   A   1991    1           7   7.64    3.04    57.02
7   A   1991    1           8   5.28    4.7     43.84
8   A   1991    1           9   5       4.02    46.26
9   A   1991    1           10  6.88    3.26    51.5
10  A   1991    1           11  8.88    2.08    58
11  A   1991    1           12  11.26   -0.1    68.5
12  A   1991    1           13  13.24   -2.58   78.84
13  A   1991    1           14  12.84   -3.72   75.82
14  A   1991    1           15  11.06   -3.4    70.36

And, the final output should look like this with ranking priority for wind, temp, rh in that order:

    site    year    month   day wind    temp    rh
0   A   1991    1           1   n/a     n/a     n/a
1   A   1991    1           2   n/a     n/a     n/a
2   A   1991    1           3   n/a     n/a     n/a
3   A   1991    1           4   n/a     n/a     n/a
12  A   1991    1           13  13.24   -2.58   78.84
13  A   1991    1           14  12.84   -3.72   75.82
11  A   1991    1           12  11.26   -0.1    68.5
14  A   1991    1           15  11.06   -3.4    70.36
4   A   1991    1           5   9.92    1.16    72.74
5   A   1991    1           6   9.64    1.6     67.08
10  A   1991    1           11  8.88    2.08    58
6   A   1991    1           7   7.64    3.04    57.02
9   A   1991    1           10  6.88    3.26    51.5
7   A   1991    1           8   5.28    4.7     43.84
8   A   1991    1           9   5       4.02    46.26

Solution

  • Try rolling mean + sort_values with na_position being first:

    import pandas as pd
    
    d = {'site': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'A', 5: 'A', 6: 'A', 7: 'A',
                  8: 'A', 9: 'A', 10: 'A', 11: 'A', 12: 'A', 13: 'A', 14: 'A'},
         'year': {0: 1991, 1: 1991, 2: 1991, 3: 1991, 4: 1991, 5: 1991, 6: 1991,
                  7: 1991, 8: 1991, 9: 1991, 10: 1991, 11: 1991, 12: 1991, 13: 1991,
                  14: 1991},
         'month': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1,
                   10: 1, 11: 1, 12: 1, 13: 1, 14: 1},
         'day': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10,
                 10: 11, 11: 12, 12: 13, 13: 14, 14: 15},
         'wind': {0: 5.3, 1: 12.6, 2: 14.7, 3: 11.8, 4: 5.2, 5: 3.9, 6: 2.6, 7: 2.9,
                  8: 10.4, 9: 14.6, 10: 13.9, 11: 14.5, 12: 12.8, 13: 8.4, 14: 5.7},
         'temp': {0: 2.1, 1: -1.4, 2: -2.6, 3: 4.8, 4: 2.9, 5: 4.3, 6: 5.8, 7: 5.7,
                  8: 1.4, 9: -0.9, 10: -1.6, 11: -5.1, 12: -6.7, 13: -4.3, 14: 0.7},
         'rh': {0: 80.4, 1: 85.0, 2: 95.1, 3: 57.3, 4: 45.9, 5: 52.1, 6: 34.7,
                7: 29.2, 8: 69.4, 9: 72.1, 10: 84.6, 11: 87.2, 12: 80.9, 13: 54.3,
                14: 44.8}}
    df = pd.DataFrame(data=d)
    
    cols = ['wind', 'temp', 'rh']
    
    df[cols] = df[cols].rolling(window=5).mean()
    
    df = df.sort_values(cols, ascending=False, na_position='first')
    
    print(df)
    

    df:

       site  year  month  day   wind  temp     rh
    0     A  1991      1    1    NaN   NaN    NaN
    1     A  1991      1    2    NaN   NaN    NaN
    2     A  1991      1    3    NaN   NaN    NaN
    3     A  1991      1    4    NaN   NaN    NaN
    12    A  1991      1   13  13.24 -2.58  78.84
    13    A  1991      1   14  12.84 -3.72  75.82
    11    A  1991      1   12  11.26 -0.10  68.50
    14    A  1991      1   15  11.06 -3.40  70.36
    4     A  1991      1    5   9.92  1.16  72.74
    5     A  1991      1    6   9.64  1.60  67.08
    10    A  1991      1   11   8.88  2.08  58.00
    6     A  1991      1    7   7.64  3.04  57.02
    9     A  1991      1   10   6.88  3.26  51.50
    7     A  1991      1    8   5.28  4.70  43.84
    8     A  1991      1    9   5.00  4.02  46.26