I have the following DataFrame that uses a three-level MultiIndex:
In [1]: iterables = [[1, 2], ['foo', 'bar'], ['one', 'two']]
...: midx = pd.MultiIndex.from_product(iterables)
...: df = pd.DataFrame(np.random.randn(8), index=midx)
...: df
Out[1]:
0
1 foo one -0.217594
two -1.361612
bar one 2.477790
two 0.874409
2 foo one 0.403577
two 0.076111
bar one 1.423512
two 0.047898
I'd like to slice the index such that I keep all of the first level while only keeping the following combinations of the second two levels: ('foo', 'one')
and ('bar', 'two')
. That is, I'd like my output to look something like this:
0
1 foo one -0.217594
bar two 0.874409
2 foo one 0.403577
bar two 0.047898
Is it possible to do this in one line, using an attribute such as .loc
, for example?
I know that I can take cross-sections of the desired combinations separately using the .xs
function, but I'd prefer a shorter, more slice-like syntax. Specifically, having a one-liner is important for my use case.
It seems like the following should work:
df.loc[[(slice(None), 'foo', 'one'), (slice(None), 'bar', 'two')]]
But this results in a TypeError: unhashable type: 'slice'
.
You can construct a Boolean mask by first dropping the first index level and then using pd.Index.isin
with a list of tuples:
df_masked = df[df.index.droplevel(0).isin([('foo', 'one'), ('bar', 'two')])]
print(df_masked)
0
1 foo one 1.510316
bar two 0.260862
2 foo one 0.813745
bar two 0.023386