Search code examples
pythonpandasdataframemoving-average

Calculate 2-month rolling average for group in Pandas


I'm trying to calculate a two-month rolling average of a player's 'score' using the below df:

df = pd.DataFrame({'player_id': [1098, 1098, 1098, 1098, 1116, 1116, 1116, 1116, 1116, 1116, 1116, 1116],
              'date': ['2018-06-22', '2018-06-23', '2018-07-24', '2018-07-25', 
                      '2018-07-22', '2018-07-23', '2018-07-24', '2018-07-25',
                      '2018-08-22', '2018-08-23', '2018-08-24', '2018-08-25'],
              'score': [-2,1,2,3,-8,3,2,-3,-2,1,2,3]})

I'd want the averages to be grouped by the 'player_ID' column so that player 1098 would have averages of -2, -0.5, 0.33, and 1 for each row, respectively. If there's just one date in 2020, it'd just be that value. If there's a date in January and February of 2020, the January row would just be that value while the Feb row would be the average of those two.


Solution

  • You can group by 'player_id' and use expanding().mean() method on 'score' column:

    df['rolling mean'] = df.groupby('player_id')['score'].expanding().mean().round(2).droplevel(0)
    

    Edit:

    Given the new information in the comments, perhaps you want rolling.mean method. Added some rows to the dataframe from the OP to better show what's going on. For the DataFrame df:

    df = pd.DataFrame({'player_id': [1098, 1098, 1098, 1098, 1098, 1098, 1098, 1098, 
                                     1116, 1116, 1116, 1116, 1116, 1116, 1116, 1116], 
                       'date': ['2018-06-22', '2018-06-23', '2018-07-24', '2018-07-25', 
                                '2019-06-22', '2019-06-25', '2019-07-25', '2020-06-22', 
                                '2018-07-22', '2018-07-23', '2018-07-24', '2018-07-25', 
                                '2018-08-22', '2018-08-23', '2018-08-24', '2018-08-25'], 
                       'score': [-2, 1, 2, 3, 7, 8, 6, 5, -8, 3, 2, -3, -2, 1, 2, 3]})
    

    Here we find the rolling mean over 60 days for each 'player_id':

    df['date'] = pd.to_datetime(df['date'])
    df = df.sort_values(by=['player_id','date'])
    df['rolling_mean'] = df.set_index('date').groupby('player_id', sort=False)['score'].rolling('60D').mean().round(2).to_numpy()
    

    Output:

        player_id       date  score  rolling_mean
    0        1098 2018-06-22     -2         -2.00
    1        1098 2018-06-23      1         -0.50
    2        1098 2018-07-24      2          0.33
    3        1098 2018-07-25      3          1.00
    6        1098 2019-06-22      7          7.00
    7        1098 2019-06-25      8          7.50
    5        1098 2019-07-25      6          7.00
    4        1098 2020-06-22      5          5.00
    8        1116 2018-07-22     -8         -8.00
    9        1116 2018-07-23      3         -2.50
    10       1116 2018-07-24      2         -1.00
    11       1116 2018-07-25     -3         -1.50
    12       1116 2018-08-22     -2         -1.60
    13       1116 2018-08-23      1         -1.17
    14       1116 2018-08-24      2         -0.71
    15       1116 2018-08-25      3         -0.25