Search code examples
pythonpandasgroup-byfeature-engineeringsktime

Using package to perform a rolling window function with a group by


Could you use a window function on groups, something in feature engine? I have been reading the docs and trying to find some clarity on how to do this but it seems like something that should exist but I can't seem to find how its implemented.

import pandas as pd

# create a sample dataframe with groups
df = pd.DataFrame({'group': ['A', 'A','A', 'B', 'B', 'B','B', 'C', 'C', 'C','C'],
                   'value': [1, 2, 3, 4, 5, 6, 7, 8,9,10,11]})

# group the data by the 'group' column and apply a rolling window mean of size 2
rolling_mean = df.groupby('group')['value'].rolling(window=2).mean()

print(rolling_mean)

I am guessing it would look something like this.

from feature_engine.timeseries.forecasting import WindowFeatures

wf = WindowFeatures(
window_size=3,
variables=["value"],
operation=["mean"],
groupby_cols=["group"]
)

transformed_df = wf.fit_transform(df)

I can't seem to find a group_by (groupby_cols) parameter in feature-engine?

It would be great to see other ways of standardising feature engineering for time series data like this, perhaps from sktime or any other framework too.


Solution

  • As you want to apply this operation individually for each group, you can use groupby_apply:

    wf = WindowFeatures(window=3, variables=["value"], functions=["mean"])
    
    # same as pd.concat([wf.fit_transform(X) for _, X in df.groupby('group')])
    out = df.groupby('group', group_keys=False).apply(wf.fit_transform)
    

    Output:

    >>> out
       group  value  value_window_3_mean
    0      A      1                  NaN
    1      A      2                  NaN
    2      A      3                  NaN
    3      B      4                  NaN
    4      B      5                  NaN
    5      B      6                  NaN
    6      B      7                  5.0
    7      C      8                  NaN
    8      C      9                  NaN
    9      C     10                  NaN
    10     C     11                  9.0