Search code examples
pythonpandasaveragescaling

How to create new column based on mean of specific window (number of rows)?


I have a dataframe like this:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({"ID":[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                   "A":[30, 20, 10, 20, 60, 80, 90, 70, 120, 150, 120, 140]})

I would like to create a new column "B" with the mean of every 4 rows (df["A"]) as a window. And the mean values should be repeated in those 4 rows, but as another column. So the result should be like this:

 df
Out[6]: 
    ID    A      B
0    1   30   20.0
1    2   20   20.0
2    3   10   20.0
3    4   20   20.0
4    5   60   75.0
5    6   80   75.0
6    7   90   75.0
7    8   70   75.0
8    9  120  132.5
9   10  150  132.5
10  11  120  132.5
11  12  140  132.5

I tried something like this df["B"] = df.rolling(window=4)['A'].mean(), but it didn't work as expected. Anyone could help me?


Solution

  • You can't use rolling here as the window is sliding, not fixed.

    You need to use the floor division of a range as grouper for groupby.transform('mean'):

    import numpy as np
    
    df['B'] = df.groupby(np.arange(len(df))//4)['A'].transform('mean')
    

    Or df.index//4 in place of np.arange(len(df))//4 if you already have a range index like in your example.

    Output:

        ID    A      B
    0    1   30   20.0
    1    2   20   20.0
    2    3   10   20.0
    3    4   20   20.0
    4    5   60   75.0
    5    6   80   75.0
    6    7   90   75.0
    7    8   70   75.0
    8    9  120  132.5
    9   10  150  132.5
    10  11  120  132.5
    11  12  140  132.5