import pandas as pd
import numpy as np
# Sample data
index = pd.MultiIndex.from_tuples([
('A', 'a1', 'x'),
('A', 'a1', 'y'),
('A', 'a2', 'x'),
('A', 'a2', 'y'),
('B', 'b1', 'x'),
('B', 'b1', 'y'),
('B', 'b2', 'x'),
('B', 'b2', 'y')
], names=['level_1', 'level_2', 'level_3'])
data = np.random.randn(len(index))
df = pd.DataFrame(data, index=index, columns=['value'])
Say I have for example the above dataframe, which is multi-indexed with 3 levels.
Now, my goal is to select a subset from this dataframe, where the first two levels of index comes from a subset of the cartesian product (A,B) * (a1, a2, b1, b2), say S = [(A, a1), (B, b2)]. I want to keep third level of the multi-index.
I expect the result to be like
Original DataFrame:
value
level_1 level_2 level_3
A a1 x 0.123456
y 0.234567
a2 x 0.345678
y 0.456789
B b1 x 0.567890
y 0.678901
b2 x 0.789012
y 0.890123
Subset DataFrame:
value
level_1 level_2 level_3
A a1 x 0.123456
y 0.234567
B b2 x 0.789012
y 0.890123
Use DataFrame.droplevel
for remove 3 level, so possible filter by subset by Index.isin
in boolean indexing
:
S = [('A', 'a1'), ('B', 'b2')]
out = df[df.droplevel(2).index.isin(S)]
print (out)
value
level_1 level_2 level_3
A a1 x 0.545790
y -1.298511
B b2 x 0.018436
y -1.076408