Here is a minimal example:
import pandas as pd
import numpy as np
np.random.seed(0)
idx = pd.MultiIndex.from_product([[1,2,3], ['a', 'b', 'c'], [6, 7]])
df = pd.DataFrame(np.random.randn(18), index=idx)
selection = [(1, 'a'), (2, 'b')]
I would like to select all the rows in df
that have as index that starts with any of the items in selection
. So I would like to get the sub dataframe of df
with the indices:
(1, 'a', 6), (1, 'a', 7), (2, 'b', 6), (2, 'b', 7)
What is the most straightforward/pythonian/pandasian way of doing this? What I found:
sel = [id[:2] in selection for id in df.index]
df.loc[sel]
You could use boolean indexing with isin
:
out = df[df.index.isin(selection)]
Output:
0
1 a 6 1.560268
7 0.674709
2 b 6 0.848069
7 0.130719
If you want to select other levels, drop the unused leading levels:
# here we want to select on levels 1 and 2
selection = [('a', 6), ('b', 7)]
df[df.index.droplevel(0).isin(selection)]
Output:
0
1 a 6 1.560268
b 7 0.137769
2 a 6 0.754946
b 7 0.130719
3 a 6 -2.275646
b 7 -2.199944