Search code examples
pythonpandasindexingmulti-index

Retrieve columns from pandas MultiIndex with multiple slices


I have a dataframe with MultiIndex columns (2 levels) and the labels of the second levels are not the same for the different groups of the first column. Consider this example:

df1 = pd.DataFrame(np.arange(12).reshape(4, 3),
                   columns=list('abc'))
df1.columns = pd.MultiIndex.from_product((["df1"], df1.columns))
df2 = pd.DataFrame(np.arange(100, 112).reshape(4, 3),
                   columns=list('def'))
df2.columns = pd.MultiIndex.from_product((["df2"], df2.columns))
df = pd.concat((df1, df2), axis=1)

It looks something like this

    df1         df2
    a   b   c   d   e   f
0   0   1   2   100 101 102
1   3   4   5   103 104 105
2   6   7   8   106 107 108
3   9   10  11  109 110 111

Now I would like to find a simple way to retrieve one or more columns of the first and the second group. I.e. something like

    df1     df2
    a   b   d
0   0   1   100
1   3   4   103
2   6   7   106
3   9   10  109

However, whatever I try, doesn't work. For example,

df.loc[:, [('df1', 'a'), ('df2', ['d', 'e'])]]

throws TypeError: unhashable type: 'list'. The same is true for

df.loc[:, [pd.IndexSlice['df1', 'a'], pd.IndexSlice['df2', ['d', 'e']]]]

Is there a way of selecting these columns without using pd.concat?


Solution

  • Your question is similar to Python Pandas slice multiindex by second level index (or any other level)

    Then df.loc[:, (slice(None), ('a', 'b', 'd'))] should give the expected result.