Search code examples
pythonpandasselectmulti-index

A more convenient way, than 'query', to select MultiIndexed rows, designating partial labels by the name of their index levels?


Let's say you have a lot of named index levels -- I'll show 4 here but use your imagination:

midx = pd.MultiIndex.from_product([['A0','A1'], ['B0','B1'],['C0','C1'],['D0','D1']],names=['quack','woof','honk','snarf'])
dfmi = pd.DataFrame(np.arange(32).reshape((len(midx), len(columns))),index=midx, columns=columns)
dfmi
                       foo  bar
quack woof honk snarf          
A0    B0   C0   D0       0    1
                D1       2    3
           C1   D0       4    5
                D1       6    7
      B1   C0   D0       8    9
                D1      10   11
           C1   D0      12   13
                D1      14   15
A1    B0   C0   D0      16   17
                D1      18   19
           C1   D0      20   21
                D1      22   23
      B1   C0   D0      24   25
                D1      26   27
           C1   D0      28   29
                D1      30   31

Then, somewhere along the line, you forgot which level number the name 'snarf' was associated with, and even how many levels there were, and you'd like to do something like:

dfmi[dfmi.snarf=='D1']

except the DataFrame is big so keeping another, reset_indexed copy around would take too much space and it would be slow anyway, and, being lazy, you don't want to go look it up, and you don't want to dive into yet another syntax with query:

dfmi.query('snarf'=='D1')

oops!

dfmi.query("'snarf'=='D1'")

oops AGAIN!

dfmi.query("snarf=='D1'")
                      foo  bar
quack woof honk snarf          
A0    B0   C0   D1       2    3
           C1   D1       6    7
      B1   C0   D1      10   11
           C1   D1      14   15
A1    B0   C0   D1      18   19
           C1   D1      22   23
      B1   C0   D1      26   27
           C1   D1      30   31

Finally!


Solution

  • Not sure it's more convenient, but one alternative to string based query is to use index.get_level_values:

    dfmi[dfmi.index.get_level_values('snarf') == 'D1']
    
                           foo  bar
    quack woof honk snarf          
    A0    B0   C0   D1       2    3
               C1   D1       6    7
          B1   C0   D1      10   11
               C1   D1      14   15
    A1    B0   C0   D1      18   19
               C1   D1      22   23
          B1   C0   D1      26   27
               C1   D1      30   31