Search code examples
pythonpandasdataframeslicemulti-index

pandas slice 3-level multiindex based on a list with 2 levels


Here is a minimal example:

import pandas as pd
import numpy as np
np.random.seed(0)
idx = pd.MultiIndex.from_product([[1,2,3], ['a', 'b', 'c'], [6, 7]])
df = pd.DataFrame(np.random.randn(18), index=idx)
selection = [(1, 'a'), (2, 'b')]

I would like to select all the rows in df that have as index that starts with any of the items in selection. So I would like to get the sub dataframe of df with the indices:

(1, 'a', 6), (1, 'a', 7), (2, 'b', 6), (2, 'b', 7)

What is the most straightforward/pythonian/pandasian way of doing this? What I found:

sel = [id[:2] in selection for id in df.index]
df.loc[sel]

Solution

  • You could use boolean indexing with isin:

    out = df[df.index.isin(selection)]
    

    Output:

                  0
    1 a 6  1.560268
        7  0.674709
    2 b 6  0.848069
        7  0.130719
    

    If you want to select other levels, drop the unused leading levels:

    # here we want to select on levels 1 and 2
    selection = [('a', 6), ('b', 7)]
    
    df[df.index.droplevel(0).isin(selection)]
    

    Output:

                  0
    1 a 6  1.560268
      b 7  0.137769
    2 a 6  0.754946
      b 7  0.130719
    3 a 6 -2.275646
      b 7 -2.199944