Search code examples
pythonpandasdataframefilteringmulti-index

Filter MultiIndex with Query Strings


I have a fairly large DataFrame, say 600 indexes, and want to use filter criteria to produce a reduced version of the DataFrame where the criteria is true. From the research I've done, filtering works well when you're applying expressions to the data, and already know the index you're operating on. What I want to do, however, is apply the filtering criteria to the index. See example below.

MultiIndex is bold, names of MultiIndex names are italic.

enter image description here

I'd like to apply the criteria like follows (or something) along these lines:

df = df[MultiIndex.query('base == 115 & Al.isin(stn)')]

Then maybe do something like this:

df = df.transpose()[MultiIndex.query('Fault.isin(cont)')].transpose

To result in:

enter image description here

I think fundamentally I'm trying to produce a boolean list to mask the MultiIndex. If there is a quick way to apply the pandas query to a 2d list? that would be acceptable. As of now it seems like an option would be to take the MultiIndex, convert it to a DataFrame, then I can apply filtering as I want to get the TF array. I'm concerned that this will be slow though.


Solution

  • If what you're after is using the df.query() nifty syntax to slice your data, then you're better off "unpivoting" your DataFrame, turning all indices and column labels into regular fields.

    You can create an "unpivot" DataFrame with:

    df_unpivot = df.stack(level=[0, 1]).rename('value').reset_index()
    

    Which will produce a DataFrame that looks like this:

      season cont  stn   base value
    0 Summer Fault Alpha  115   1.0
    1 Summer Fault Beta   115   0.8
    2 Summer Fault Gamma  230   0.7
    3 Summer Trip  Alpha  115   1.2
    4 Summer Trip  Beta   115   0.9
    ...
    

    Which you can then query with:

    df_unpivot.query(
        'cont.str.contains("Fault") and '
        'stn.str.contains("Al") and '
        'base == 115'
    )
    

    Which produces:

      season cont  stn   base value
    0 Summer Fault Alpha  115   1.0
    6 Winter Fault Alpha  115   0.7
    

    Which is the two values you were expecting.