Search code examples
pandasdataframepandas-groupbymulti-index

Apply a pandas window function to a reversed MultiIndex level?


I need to get an .expanding calculation on a MultiIndex DataFrame. But I need it to run in reverse. Here's a sample DataFrame:

np.random.seed(456)
j = [(a, b) for a in ['A','B','C'] for b in pd.date_range('2018-01-01', periods=5, freq='W')]
i = pd.MultiIndex.from_tuples(j, names=['Name','Num'])
df = pd.DataFrame(np.random.randn(15), i, columns=['Vals'])
df.loc[df['Vals'] < 0] = np.nan

And here is an example of what I want to do for each of the level-0 index values:

k = df.loc['A']
k['Missing'] = k[::-1].isnull().expanding().sum() # Expanding-sum on reversed level-1

This produces the correct results for that one top-level value 'A':

                Vals  Missing
Num                          
2018-01-07       NaN      2.0
2018-01-14       NaN      1.0
2018-01-21  0.618576      0.0
2018-01-28  0.568692      0.0
2018-02-04  1.350509      0.0

But how do I get that to apply to all top-level index values, so I can set df['Missing'] =?

I was trying things of the form df.groupby('Name')[::-1].isnull().expanding().sum() ... but I can't get a functional form that allows the level-1 index to be reversed for the calculation.

What is a pandaic expression to do this?


Solution

  • Got it: After grouping we have to strip the top level off the MultiIndex before working on the inner set:

    df['Missing'] = df.groupby('Name').apply(
            lambda x: x.reset_index(level=0, drop=True)[::-1].isnull().expanding().sum()
        )
    

    I.e., we groupby('Name'), and then for each group the lambda expression strips the level-0 index via .reset_index(level=0, drop=True), at which point we can use the remaining DataFrame in reverse order: x[::-1].