Search code examples
pythonpandasdataframemulti-index

How do I select a subset of a DataFrame based on a condition on one level of a MultiIndex


Similar to How do I select a subset of a DataFrame based on one level of a MultiIndex, let

df = pd.DataFrame({"v":range(12)},
                  index=pd.MultiIndex.from_frame(
                      pd.DataFrame({"name":4*["a"]+4*["b"]+4*["c"],
                                    "rank":[x*x for x in range(12)]})))

and suppose I want to select only rows with the second level rank being within 25 from its smallest value for the given first level name:

            v
name rank    
a    0      0
     1      1
     4      2
     9      3
b    16     4
     25     5
     36     6
c    64     8
     81     9

This time I have no idea how to do that easily (other than a convoluted combination of to_frame and groupby - as explained in How do I select a subset of a DataFrame based on a condition on a column).


Solution

  • Very similar, you can obtain the min rank with reset_index and groupby, then query:

    # select only one column `v` so we don't copy too many data
    min_rank = (df['v'].reset_index('rank')
                       .groupby('name', sort=False)['rank']   # avoid shuffling the dataframe
                       .transform('min')
                       .to_numpy()                            # avoid index alignment
               )
    
    # pass the variable into query string with @
    df.query('rank < @min_rank + 25')
    
    # also 
    # df.loc[df.index.get_level_values('rank') < min_rank + 25]
    

    Output:

               v
    name rank   
    a    0     0
         1     1
         4     2
         9     3
    b    16    4
         25    5
         36    6
    c    64    8
         81    9