Search code examples
pythonpandasmulti-index

How to slice multiindex dataframe with list of labels on one level


MultiIndex dataframes are very powerful but personally I think there is no enough (clear) documentations on it, specially for different type of slicing... Here is my question:

How to slice a multi-indexed dataframe just on one level with a list of labels? Please help me if you have a solution (without reseting indexes and converting the dataframe to single level index! Which is obvious and not efficient)

For example, we have following dataframe:

import pandas as pd
import numpy as np

df = pd.DataFrame(index=range(10))
df['id'] = pd.Series(range(10,20))
df['name'] = [f'name_{id}' for id in range(10,20)]
df['price'] = np.random.rand(df.index.size)
df['date'] = pd.date_range('20200310', '20200319')
df = df.set_index(['id', 'date'])
df

enter image description here

Slicing on one label is working just fine:

df.xs('2020-03-10', level='date', drop_level=False)

enter image description here

But how can we slice on a list of labels on that level?

df.xs(('2020-03-10', '2020-03-11', '2020-03-12'), level='date', drop_level=False)

This leads to an exception:

enter image description here

However Python doc says that "key" parameter could be a tuple as well:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.xs.html

enter image description here


Solution

  • For filter by multiple values use Index.get_level_values with Index.isin and boolean indexing:

    a = df[df.index.get_level_values('date').isin(('2020-03-10', '2020-03-11', '2020-03-12'))]
    print (a)
                      name     price
    id date                         
    10 2020-03-10  name_10  0.557772
    11 2020-03-11  name_11  0.122315
    12 2020-03-12  name_12  0.775976
    

    However Python doc says that "key" parameter could be a tuple as well:

    Tuple is possible use, but working differently - you can select by both labels like:

    b = df.xs((10, '2020-03-10'), drop_level=False)
    print (b)
    name      name_10
    price    0.348808
    Name: (10, 2020-03-10 00:00:00), dtype: object
    
    c = df.xs((10, '2020-03-10'), level=('id','date'), drop_level=False)
    print (c)
                      name     price
    id date                         
    10 2020-03-10  name_10  0.239876
    

    Like @yatu mentioned, another solution with IndexSlice is with : for all first levels and last : for all columns:

    df = df.loc[pd.IndexSlice[:, ['2020-03-10', '2020-03-11', '2020-03-12']], :]
    print (df)
                      name     price
    id date                         
    10 2020-03-10  name_10  0.557488
    11 2020-03-11  name_11  0.592082
    12 2020-03-12  name_12  0.547747