I have a MultiIndex dataframe where I need to compute the percentage change column-wise. I used apply
in conjunction with pd.pct_change. This is working as long I don't take into consideration the outer level of the MultiIndex with groupby
.
# Creat pd.MultiIndex and include some NaNs
rng = pd.date_range(start='2018-12-20', periods=20, name='date')
date = np.concatenate([rng, rng])
perm = np.array([*np.repeat(1, 20), *np.repeat(2, 20)])
d = {'perm': perm,
'date': date,
'ser_1': np.random.randint(low=1, high=10, size=[40]),
'ser_2': np.random.randint(low=1, high=10, size=[40])}
df = pd.DataFrame(data=d)
df.iloc[5:8, 2:] = np.nan
df.iloc[11:13, 2] = np.nan
df.iloc[25:28, 2:] = np.nan
df.iloc[33:37, 3] = np.nan
df.set_index(['perm', 'date'], drop=True, inplace=True)
# Apply pd.pct_change to every column individually in order to take care of the
# NaNs at different positions. Also, use groupby for every 'perm'. This one is
# where I am struggling.
# This is working properly, but it doesn't take into account 'perm'. The first
# two rows of perm=2 (i.e. rows 20 and 21) must be NaN.
chg = df.apply(lambda x, periods:
x.dropna().pct_change(periods=2).
reindex(df.index, method='ffill'),
axis=0, periods=2)
# This one is causing an error:
# TypeError: () got an unexpected keyword argument 'axis'
chg = df.groupby('perm').apply(lambda x, periods:
x.dropna().pct_change(periods=2).
reindex(df.index, method='ffill'),
axis=0, periods=2)
The " unexpected keyword argument 'axis' " error comes from the fact that pandas.DataFrame.apply and pandas.core.groupby.GroupBy.apply are two different methods, with similar but different parameters: they have the same name because they are intended to perform a very similar task, but they belong to two different classes.
If you check the documentation, you'll see that the first one require an axis
parameter. The second one not.
So to have a working code with groupby
, just remove the axis
parameter from GroupBy.apply
. Since you want to work column by column because of dropna
, you need to use DataFrame.apply
inside GroupBy.apply
:
chg = df.groupby('perm').apply(lambda x:
x.apply(lambda y : y.dropna().pct_change(periods=2)
.reindex(x.index, method='ffill'),
axis=0))
This produces what you want (first two rows of "perm 2" are NaN
, other numbers are equal to the result you get by using apply
without groupby
).
Note that I've also edit the first argument in reindex
: is x.index
and not df.index
otherwise you'll get a double perm
index in the final result.
Final note, no need to pass a period
argument to the lambda
function if you are setting it hardcoded in pc_change
. Is redundant.