I need to get an .expanding
calculation on a MultiIndex DataFrame. But I need it to run in reverse. Here's a sample DataFrame:
np.random.seed(456)
j = [(a, b) for a in ['A','B','C'] for b in pd.date_range('2018-01-01', periods=5, freq='W')]
i = pd.MultiIndex.from_tuples(j, names=['Name','Num'])
df = pd.DataFrame(np.random.randn(15), i, columns=['Vals'])
df.loc[df['Vals'] < 0] = np.nan
And here is an example of what I want to do for each of the level-0 index values:
k = df.loc['A']
k['Missing'] = k[::-1].isnull().expanding().sum() # Expanding-sum on reversed level-1
This produces the correct results for that one top-level value 'A'
:
Vals Missing
Num
2018-01-07 NaN 2.0
2018-01-14 NaN 1.0
2018-01-21 0.618576 0.0
2018-01-28 0.568692 0.0
2018-02-04 1.350509 0.0
But how do I get that to apply to all top-level index values, so I can set df['Missing'] =
?
I was trying things of the form df.groupby('Name')[::-1].isnull().expanding().sum()
... but I can't get a functional form that allows the level-1 index to be reversed for the calculation.
What is a pandaic expression to do this?
Got it: After grouping we have to strip the top level off the MultiIndex before working on the inner set:
df['Missing'] = df.groupby('Name').apply(
lambda x: x.reset_index(level=0, drop=True)[::-1].isnull().expanding().sum()
)
I.e., we groupby('Name')
, and then for each group the lambda expression strips the level-0 index via .reset_index(level=0, drop=True)
, at which point we can use the remaining DataFrame in reverse order: x[::-1]
.