Search code examples
pythonpandasgroup-byapplyrolling-computation

Apply function to a groupby and rolling group


Given a pandas dataframe, group it by 3 levels and apply a function which takes two columns as arguments on a rolling basis:

Code to create the data frame:

import pandas as pd
import numpy as np
df_index = pd.DataFrame({
    'classes': ['A']*3*3 +  ['B']*3*3 +['C']*3*3,
    'names': ['Alex']*3 + ['Alan']*3 + ['Ash']*3 + ['Bart']*3 + ['Blake']*3 + ['Beth']*3 + ['Charlie']*3 + ['Cristine']*3 + ['Cameron']*3,
    'numbers': [0, 1, 2] * 9
})

df_values = pd.DataFrame(data=np.random.normal(), index=pd.date_range(start='2020-01-01', end='2024-01-01'), columns=['net', 'gross']).rename_axis('date').reset_index(drop=False)

df = pd.DataFrame()
for i in range(len(df_index)):
    _df = df_values.copy()
    _df['classes'] = df_index.iloc[i]['classes']
    _df['names'] = df_index.iloc[i]['names']
    _df['numbers'] = df_index.iloc[i]['numbers']
    df = pd.concat((df, _df), axis=0)

df.set_index(['classes','names','numbers','date'], inplace=True)

Some function:

def fun(net, gross):
    return net.mean() / gross.std()

The following does not work. I am looking to groupby, and apply "fun()" in a rolling basis:

df.groupby(['classes', 'names', 'numbers']).rolling(window=500).apply(
    lambda x: fun(net=x['net'], gross=x['gross'])  
)

Thanks.

PS: the real fun() is much complex than the one here so I cannot calculate ".mean()" and ".std()" directly to the groupby.


Solution

  • rolling.apply operates per column, so it isn't directly possible to use multiple columns.

    Instead you could slice one column and use a side effect by slicing the original DataFrame based on the indices of the window:

    (df.groupby(['classes', 'names', 'numbers']).rolling(window=500)['net']
       .apply(lambda x: fun(net=df.loc[x.index, 'net'],
                            gross=df.loc[x.index, 'gross']))
    )
    

    But be aware that the operation might be very slow!