Search code examples
pythonpandasdataframeslicemulti-index

Pandas DataFrame - How to retrieve specific combinations of MultiIndex levels


I have the following DataFrame that uses a three-level MultiIndex:

In [1]: iterables = [[1, 2], ['foo', 'bar'], ['one', 'two']]
   ...: midx = pd.MultiIndex.from_product(iterables)
   ...: df = pd.DataFrame(np.random.randn(8), index=midx)
   ...: df

Out[1]:
                  0
1 foo one -0.217594
      two -1.361612
  bar one  2.477790
      two  0.874409
2 foo one  0.403577
      two  0.076111
  bar one  1.423512
      two  0.047898

I'd like to slice the index such that I keep all of the first level while only keeping the following combinations of the second two levels: ('foo', 'one') and ('bar', 'two'). That is, I'd like my output to look something like this:

                  0
1 foo one -0.217594
  bar two  0.874409
2 foo one  0.403577
  bar two  0.047898

Is it possible to do this in one line, using an attribute such as .loc, for example?


I know that I can take cross-sections of the desired combinations separately using the .xs function, but I'd prefer a shorter, more slice-like syntax. Specifically, having a one-liner is important for my use case.

It seems like the following should work:

df.loc[[(slice(None), 'foo', 'one'), (slice(None), 'bar', 'two')]]

But this results in a TypeError: unhashable type: 'slice'.


Solution

  • You can construct a Boolean mask by first dropping the first index level and then using pd.Index.isin with a list of tuples:

    df_masked = df[df.index.droplevel(0).isin([('foo', 'one'), ('bar', 'two')])]
    
    print(df_masked)
    
                      0
    1 foo one  1.510316
      bar two  0.260862
    2 foo one  0.813745
      bar two  0.023386