(Apologies if this is a repost. The closest answer I could find was this: multiindex selecting in pandas and it doesn't quite do what I want. Instead I'll use the data from that question as an example because my own data has a similar setup but would require a bit more explanation)
Here's the data (reposted):
0 1 2 3
first second third
C one mean 3 4 2 7
std 4 1 7 7
two mean 3 1 4 7
std 5 6 7 0
three mean 7 0 2 5
std 7 3 7 1
H one mean 2 4 3 3
std 5 5 3 5
two mean 5 7 0 6
std 0 1 0 2
three mean 5 2 5 1
std 9 0 4 6
V one mean 3 7 3 9
std 8 7 9 3
two mean 1 9 9 0
std 1 1 5 1
three mean 3 1 0 6
std 6 2 7 4
Based on a prior query, I've got a set of tuples of first and second index levels based on the values in column 0 that are less than equal to 3. For example:
# selects rows where column 0 has a 'mean' value <= 3. ser = df.loc[(slice(None), slice(None), 'mean'), 0] ser = ser[ser <= 3] idx_tuples = [(val[0], val[1]) for val in ser.index] # Has value: [(C, one), (C, two), (H, one), (V, one), (V, two), (V, three)]
Now I would like to select both the mean and std rows from the original frame (df
) for index values that satisfy the above set of tuples. I could iterate through each tuple individually followed by a concatenate operation, but I was wondering if there is a more Pythonic / Pandas-ic(?) way to achieve what I want with intelligent slicing, etc. (particularly because the number of tuples in my dataset numbers in the thousands)?
Thanks!
PS: Note that my problem arises in the fact that specific tuples of the first 2 levels satisfy the selection condition, so I can't do a naive cross-product of values from get_level_values
.
If you can't get away from that list of tuples, I'd suggest a merge
. The merge
will destroy the MultiIndex, so we need to remove it first, so that the 'third' level is brought along and then we reconstruct it in the end.
cols = ['first', 'second']
df = (df.reset_index()
.merge(pd.DataFrame(idx_tuples, columns=cols), on=cols)
.set_index(df.index.names))
0 1 2 3
first second third
C one mean 3 4 2 7
std 4 1 7 7
two mean 3 1 4 7
std 5 6 7 0
H one mean 2 4 3 3
std 5 5 3 5
V one mean 3 7 3 9
std 8 7 9 3
two mean 1 9 9 0
std 1 1 5 1
three mean 3 1 0 6
std 6 2 7 4
Another alternative is to change how you create this selection. You can instead use transform
to get a Boolean Mask across the ['first', 'second'] levels if the 'mean' row is <=3.
m = ((df[0].le(3) & (df.index.get_level_values('third') == 'mean'))
.groupby(['first', 'second']).transform('any'))
df.loc[m]
# Same as above