Search code examples
pythonpandasmulti-index

Selecting subset tuples of pairs of levels in MultiIndex


(Apologies if this is a repost. The closest answer I could find was this: multiindex selecting in pandas and it doesn't quite do what I want. Instead I'll use the data from that question as an example because my own data has a similar setup but would require a bit more explanation)

Here's the data (reposted):

                    0  1  2  3
first second third            
C     one    mean   3  4  2  7
             std    4  1  7  7
      two    mean   3  1  4  7
             std    5  6  7  0
      three  mean   7  0  2  5
             std    7  3  7  1
H     one    mean   2  4  3  3
             std    5  5  3  5
      two    mean   5  7  0  6
             std    0  1  0  2
      three  mean   5  2  5  1
             std    9  0  4  6
V     one    mean   3  7  3  9
             std    8  7  9  3
      two    mean   1  9  9  0
             std    1  1  5  1
      three  mean   3  1  0  6
             std    6  2  7  4

Based on a prior query, I've got a set of tuples of first and second index levels based on the values in column 0 that are less than equal to 3. For example:

# selects rows where column 0 has a 'mean' value <= 3.
ser = df.loc[(slice(None), slice(None), 'mean'), 0]
ser = ser[ser <= 3]

idx_tuples = [(val[0], val[1]) for val in ser.index]
# Has value: [(C, one), (C, two), (H, one), (V, one), (V, two), (V, three)]

Now I would like to select both the mean and std rows from the original frame (df) for index values that satisfy the above set of tuples. I could iterate through each tuple individually followed by a concatenate operation, but I was wondering if there is a more Pythonic / Pandas-ic(?) way to achieve what I want with intelligent slicing, etc. (particularly because the number of tuples in my dataset numbers in the thousands)?

Thanks!

PS: Note that my problem arises in the fact that specific tuples of the first 2 levels satisfy the selection condition, so I can't do a naive cross-product of values from get_level_values.


Solution

  • If you can't get away from that list of tuples, I'd suggest a merge. The merge will destroy the MultiIndex, so we need to remove it first, so that the 'third' level is brought along and then we reconstruct it in the end.

    cols = ['first', 'second']
    df = (df.reset_index()
            .merge(pd.DataFrame(idx_tuples, columns=cols), on=cols)
            .set_index(df.index.names))
    
                        0  1  2  3
    first second third            
    C     one    mean   3  4  2  7
                 std    4  1  7  7
          two    mean   3  1  4  7
                 std    5  6  7  0
    H     one    mean   2  4  3  3
                 std    5  5  3  5
    V     one    mean   3  7  3  9
                 std    8  7  9  3
          two    mean   1  9  9  0
                 std    1  1  5  1
          three  mean   3  1  0  6
                 std    6  2  7  4
    

    Another alternative is to change how you create this selection. You can instead use transform to get a Boolean Mask across the ['first', 'second'] levels if the 'mean' row is <=3.

    m = ((df[0].le(3) & (df.index.get_level_values('third') == 'mean'))
            .groupby(['first', 'second']).transform('any'))
    
    df.loc[m]
    # Same as above