Search code examples
pythonpandasmoving-average

How to include Moving Average with Pandas based on Values on other Columns


I am trying to calculate the Moving Average on the following dataframe but i have trouble joining the result back to the dataframe
The dataframe is : (Moving Average values are displayed in parentheses)

Key1 Key2 Value MovingAverage  
  1    2    1       (Nan)
  1    7    2       (Nan)
  1    8    3       (Nan)
  2    5    1       (Nan)
  2    3    2       (Nan)
  2    2    3       (Nan)
  3    7    1       (Nan)
  3    5    2       (Nan)
  3    8    3       (Nan)
  4    7    1       (1.33)
  4    2    2        (2)
  4    9    3       (Nan)
  5    8    1       (2.33)
  5    3    2       (Nan)
  5    9    3       (Nan)
  6    2    1        (2)
  6    7    2       (1.33)
  6    9    3        (3)

The Code is :

import pandas as pd
d = {'Key1':[1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6], 'Key2':[2,7,8,5,3,2,7,5,8,7,2,9,8,3,9,2,7,9],'Value':[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3]}
df = pd.DataFrame(d)
print(df)
MaDf = df.groupby(['Key2'])['Value'].rolling(window=3).mean().to_frame('mean')
print (MaDf) 

If you run the code it will correctly calculate the Moving Average based on 'Key2' and 'Value' but i can't find the way to correctly reinsert it back to the original dataframe (df)


Solution

  • Remove first level of MultiIndex by Series.reset_index with drop=True for align by second level:

    df['mean'] = (df.groupby('Key2')['Value']
                    .rolling(window=3)
                    .mean()
                    .reset_index(level=0, drop=True))
    print (df)
        Key1  Key2  Value      mean
    0      1     2      1       NaN
    1      1     7      2       NaN
    2      1     8      3       NaN
    3      2     5      1       NaN
    4      2     3      2       NaN
    5      2     2      3       NaN
    6      3     7      1       NaN
    7      3     5      2       NaN
    8      3     8      3       NaN
    9      4     7      1  1.333333
    10     4     2      2  2.000000
    11     4     9      3       NaN
    12     5     8      1  2.333333
    13     5     3      2       NaN
    14     5     9      3       NaN
    15     6     2      1  2.000000
    16     6     7      2  1.333333
    17     6     9      3  3.000000
    

    If default RangeIndex is possible use Series.sort_index:

    df['mean'] = (df.groupby(['Key2'])['Value']
                    .rolling(window=3)
                    .mean()
                    .sort_index(level=1)
                    .values)
    print (df)
        Key1  Key2  Value      mean
    0      1     2      1       NaN
    1      1     7      2       NaN
    2      1     8      3       NaN
    3      2     5      1       NaN
    4      2     3      2       NaN
    5      2     2      3       NaN
    6      3     7      1       NaN
    7      3     5      2       NaN
    8      3     8      3       NaN
    9      4     7      1  1.333333
    10     4     2      2  2.000000
    11     4     9      3       NaN
    12     5     8      1  2.333333
    13     5     3      2       NaN
    14     5     9      3       NaN
    15     6     2      1  2.000000
    16     6     7      2  1.333333
    17     6     9      3  3.000000