Search code examples
pythonpandasdataframemulti-index

Value based partial slicing with non-existing keys is now deprecated


When running the snippet of example code below with pandas 2.2.3, I get an error saying KeyError: 'D'

index = pd.MultiIndex.from_tuples(
    [('A', 1), ('A', 2), ('A', 3), ('B', 1), ('B', 2), ('B', 2)],
    names=['letter', 'number']
)
df = pd.DataFrame({'value': [10, 20, 30, 40, 50, 60]}, index=index)
idx = pd.IndexSlice
result = df.loc[idx[['A', 'D'], [1,2]], :]

Does pandas offer any alternatives for searching a multi-index with values that don't exist?

If I run the same code using pandas 1.5.3, I get the expected value:

                    value
letter    number
A         1         10
          2         20

Solution

  • When you run this code with pandas 1.5.3 you should in fact receive a FutureWarning:

    FutureWarning: The behavior of indexing on a MultiIndex with a nested sequence of labels is deprecated and will change in a future version. series.loc[label, sequence] will raise if any members of 'sequence' or not present in the index's second level. To retain the old behavior, use series.index.isin(sequence, level=1)

    (Note that it should read: "are not present".)


    So, let's indeed use Index.isin to allow boolean indexing:

    m = (df.index.isin(['A', 'D'], level='letter') 
         & df.index.isin([1, 2], level='number'))
    
    out = df.loc[m, :]
    

    Output:

                   value
    letter number       
    A      1          10
           2          20
    

    If you have many different conditions, you could consider creating a dictionary and use np.logical_and + reduce:

    dict_isin = {
        'letter': ['A', 'D'],
        'number': [1, 2]
        }
    
    m = np.logical_and.reduce(
        [df.index.isin(v, level=k) for k, v in dict_isin.items()]
    )
    
    out2 = df.loc[m, :]
    
    out2.equals(out)
    # True