Pandas dataframe: How to create multiple columns of different lagged values with groubpy in one call?

End goal: have n lagged (named) feature values in every row of a pandas dataframe respecting group

Secondary goal: No for loops in pandas call

A naive implementation could be:

n = 7
for feature in features:
    for i in range(1, n + 1):
        df[feature + "_" + str(i)] = df.groupby(["a", "b"])[feature].shift(i)

I would like to have something like this instead, saving len(features) * n - 1 calls to the pandas api:

lagged_features = {}
n = 7
for feature in features:
    for i in range(1, n + 1):
        key = feature + "_" + str(i)
        lagged_features[key] = pd.NamedAgg(feature, lambda x: x.shift(i))

df_lagged = df.groupby(["a", "b"]).agg(**lagged_features)

However agg doesn't like shift :(

And you get a ValueError: Must produce aggregated value

I believe you should be able to use apply? But, I can only think of a way to get the number of calls down to n (much better, but it would be so much cleaner if you could do something like agg)

Solution

Your initial approach attempted to use pandas.DataFrame.groupby().agg(), but, .agg() method requires functions that return a single value per group, and shift() does not meet this requirement as it returns a Series instead.

To solve this problem, you can use pandas.DataFrame.groupby().apply() as you mentioned. This function allows you to apply a function that returns a DataFrame to each group.

Here is an example. It's a solution where you only need to call the pandas API once for each group:

import pandas as pd

n = 2
features = ['feature1', 'feature2']

def lag_features(grp):
    lags = {f'{feature}_{i}': grp[feature].shift(i) for feature in features for i in range(1, n + 1)}
    return pd.DataFrame(lags)

df_lagged = df.groupby(["a", "b"]).apply(lag_features).reset_index(level=2, drop=True)

Explanation

lag_features(grp) is a helper function that gets applied to each group. It takes a DataFrame (a single group) as input. Inside this function, a dictionary comprehension is used to generate the lagged features for each feature in the group. This dictionary is then transformed into DataFrame, which is returned by the function.
df.groupby(["a", "b"]).apply(lag_features) groups the DataFrame by a and b, then applies lag_features(grp) to each group. This gives us a DataFrame with multi-index where the 3-rd level of the index is the original DataFrame's index.
reset_index(level=2, drop=True) is used to remove this 3rd level of the index