Search code examples
pythonpandas

pandas multi index subset selection


import pandas as pd
import numpy as np

# Sample data
index = pd.MultiIndex.from_tuples([
    ('A', 'a1', 'x'),
    ('A', 'a1', 'y'),
    ('A', 'a2', 'x'),
    ('A', 'a2', 'y'),
    ('B', 'b1', 'x'),
    ('B', 'b1', 'y'),
    ('B', 'b2', 'x'),
    ('B', 'b2', 'y')
], names=['level_1', 'level_2', 'level_3'])

data = np.random.randn(len(index))
df = pd.DataFrame(data, index=index, columns=['value'])

Say I have for example the above dataframe, which is multi-indexed with 3 levels.

Now, my goal is to select a subset from this dataframe, where the first two levels of index comes from a subset of the cartesian product (A,B) * (a1, a2, b1, b2), say S = [(A, a1), (B, b2)]. I want to keep third level of the multi-index.

I expect the result to be like

Original DataFrame:
                        value
level_1 level_2 level_3       
A       a1      x     0.123456
                y     0.234567
        a2      x     0.345678
                y     0.456789
B       b1      x     0.567890
                y     0.678901
        b2      x     0.789012
                y     0.890123

Subset DataFrame:
                        value
level_1 level_2 level_3       
A       a1      x     0.123456
                y     0.234567
B       b2      x     0.789012
                y     0.890123

Solution

  • Use DataFrame.droplevel for remove 3 level, so possible filter by subset by Index.isin in boolean indexing:

    S = [('A', 'a1'), ('B', 'b2')]
    
    out = df[df.droplevel(2).index.isin(S)]
    print (out)
                                value
    level_1 level_2 level_3          
    A       a1      x        0.545790
                    y       -1.298511
    B       b2      x        0.018436
                    y       -1.076408