Similar to How do I select a subset of a DataFrame based on one level of a MultiIndex, let
df = pd.DataFrame({"v":range(12)},
index=pd.MultiIndex.from_frame(
pd.DataFrame({"name":4*["a"]+4*["b"]+4*["c"],
"rank":[x*x for x in range(12)]})))
and suppose I want to select only rows with the second level rank
being within 25 from its smallest value for the given first level name
:
v
name rank
a 0 0
1 1
4 2
9 3
b 16 4
25 5
36 6
c 64 8
81 9
This time I have no idea how to do that easily (other than a convoluted combination of to_frame
and groupby
- as explained in How do I select a subset of a DataFrame based on a condition on a column).
Very similar, you can obtain the min rank with reset_index and groupby, then query:
# select only one column `v` so we don't copy too many data
min_rank = (df['v'].reset_index('rank')
.groupby('name', sort=False)['rank'] # avoid shuffling the dataframe
.transform('min')
.to_numpy() # avoid index alignment
)
# pass the variable into query string with @
df.query('rank < @min_rank + 25')
# also
# df.loc[df.index.get_level_values('rank') < min_rank + 25]
Output:
v
name rank
a 0 0
1 1
4 2
9 3
b 16 4
25 5
36 6
c 64 8
81 9