Search code examples
pandasdataframemulti-indexnumpy-slicing

Slicing MultiIndex Pandas Dataframe with integer values incorrect?


We have a MultiIndex DataFrame where the top-level index uses integer values. Slicing for a specific value returns all index values up to the requested value, not just the requested value. Is this a bug, or are we doing it wrong?

Example:

import numpy as np
import pandas as pd
midx = pd.MultiIndex.from_product([[1,2], ['A', 'B']])
df = pd.DataFrame(np.arange(4).reshape((len(midx), 1)), index=midx, columns=['Values'])

df.loc[(slice(1), slice(None)), :]  # Slice for only top index value=1

This first slice returns just the index values = 1, as expected:

        Values
1   A   0
1   B   1

But:

df.loc[(slice(2), slice(None)), :]  # Slice for only top index value=2

returns index value 1 as well as value 2, like this:

        Values
1   A   0
1   B   1
2   A   2
2   B   3

where we expect this:

        Values
2   A   2
2   B   3

Solution

  • When you call slice(x), x is the stop value (see the manual); so it will return everything up and including that value. In your case you can simply supply the desired index directly:

    df.loc[(2, slice(None)), :]
    

    Output:

         Values
    2 A       2
      B       3
    

    Note that in calls to .loc, slice end values are inclusive; see the manual and this Q&A.