Search code examples
pythonpandas-groupbymulti-index

Python: Compute percentage change for every column in MultiIndex DataFrame in conjunction with groupby


I have a MultiIndex dataframe where I need to compute the percentage change column-wise. I used apply in conjunction with pd.pct_change. This is working as long I don't take into consideration the outer level of the MultiIndex with groupby.

# Creat pd.MultiIndex and include some NaNs

rng = pd.date_range(start='2018-12-20', periods=20, name='date')
date = np.concatenate([rng, rng])
perm = np.array([*np.repeat(1, 20), *np.repeat(2, 20)])

d = {'perm': perm,
     'date': date,
     'ser_1': np.random.randint(low=1, high=10, size=[40]),
     'ser_2': np.random.randint(low=1, high=10, size=[40])}

df = pd.DataFrame(data=d)

df.iloc[5:8, 2:] = np.nan
df.iloc[11:13, 2] = np.nan
df.iloc[25:28, 2:] = np.nan
df.iloc[33:37, 3] = np.nan

df.set_index(['perm', 'date'], drop=True, inplace=True)


# Apply pd.pct_change to every column individually in order to take care of the
# NaNs at different positions. Also, use groupby for every 'perm'. This one is
# where I am struggling.

# This is working properly, but it doesn't take into account 'perm'. The first
# two rows of perm=2 (i.e. rows 20 and 21) must be NaN.
chg = df.apply(lambda x, periods:
               x.dropna().pct_change(periods=2).
               reindex(df.index, method='ffill'),
               axis=0, periods=2)

# This one is causing an error:
# TypeError: () got an unexpected keyword argument 'axis'
chg = df.groupby('perm').apply(lambda x, periods:
                               x.dropna().pct_change(periods=2).
                               reindex(df.index, method='ffill'),
                               axis=0, periods=2)

Solution

  • The " unexpected keyword argument 'axis' " error comes from the fact that pandas.DataFrame.apply and pandas.core.groupby.GroupBy.apply are two different methods, with similar but different parameters: they have the same name because they are intended to perform a very similar task, but they belong to two different classes.
    If you check the documentation, you'll see that the first one require an axis parameter. The second one not.

    So to have a working code with groupby, just remove the axis parameter from GroupBy.apply. Since you want to work column by column because of dropna, you need to use DataFrame.apply inside GroupBy.apply:

    chg = df.groupby('perm').apply(lambda x:
                               x.apply(lambda y : y.dropna().pct_change(periods=2)
                               .reindex(x.index, method='ffill'),
                               axis=0))
    

    This produces what you want (first two rows of "perm 2" are NaN, other numbers are equal to the result you get by using apply without groupby).
    Note that I've also edit the first argument in reindex: is x.index and not df.index otherwise you'll get a double perm index in the final result.

    Final note, no need to pass a period argument to the lambda function if you are setting it hardcoded in pc_change. Is redundant.