Search code examples
pythonpandasfunctiondataframefeature-engineering

Continuous update of Aggregation of last 5 data sets in python


I need to add a new feature that aggregates the last 5 data. When it adds 6th data, then it should forget the first data and consider only the last 5 data sets as shown below. Here is the dummy data frame, new_feature is the expected output.

id    feature    new_feature

1       a            a
2       b            a+b
3       c            a+b+c
4       d            a+b+c+d
5       e            a+b+c+d+e
6       f            b+c+d+e+f
7       g            c+d+e+f+g

Solution

  • Use Series.rolling with min_periods=1 parameter and sum:

    df = pd.DataFrame({'feature':[1,2,4,5,6,2,3,4,5]})
    df['new_feature'] = df['feature'].rolling(5, min_periods=1).sum()
    print (df)
       feature  new_feature
    0        1          1.0
    1        2          3.0
    2        4          7.0
    3        5         12.0
    4        6         18.0
    5        2         19.0
    6        3         20.0
    7        4         20.0
    8        5         20.0