Search code examples
pythonpandasmulti-indexhierarchical

Selecting columns from pandas MultiIndex


I have DataFrame with MultiIndex columns that looks like this:

# sample data
col = pd.MultiIndex.from_arrays([['one', 'one', 'one', 'two', 'two', 'two'],
                                ['a', 'b', 'c', 'a', 'b', 'c']])
data = pd.DataFrame(np.random.randn(4, 6), columns=col)
data

sample data

What is the proper, simple way of selecting only specific columns (e.g. ['a', 'c'], not a range) from the second level?

Currently I am doing it like this:

import itertools
tuples = [i for i in itertools.product(['one', 'two'], ['a', 'c'])]
new_index = pd.MultiIndex.from_tuples(tuples)
print(new_index)
data.reindex_axis(new_index, axis=1)

expected result

It doesn't feel like a good solution, however, because I have to bust out itertools, build another MultiIndex by hand and then reindex (and my actual code is even messier, since the column lists aren't so simple to fetch). I am pretty sure there has to be some ix or xs way of doing this, but everything I tried resulted in errors.


Solution

  • It's not great, but maybe:

    >>> data
            one                           two                    
              a         b         c         a         b         c
    0 -0.927134 -1.204302  0.711426  0.854065 -0.608661  1.140052
    1 -0.690745  0.517359 -0.631856  0.178464 -0.312543 -0.418541
    2  1.086432  0.194193  0.808235 -0.418109  1.055057  1.886883
    3 -0.373822 -0.012812  1.329105  1.774723 -2.229428 -0.617690
    >>> data.loc[:,data.columns.get_level_values(1).isin({"a", "c"})]
            one                 two          
              a         c         a         c
    0 -0.927134  0.711426  0.854065  1.140052
    1 -0.690745 -0.631856  0.178464 -0.418541
    2  1.086432  0.808235 -0.418109  1.886883
    3 -0.373822  1.329105  1.774723 -0.617690
    

    would work?