Search code examples
python-3.xpandasdataframesortingmulti-index

How to lex sort multi index data frame with multi level column and then index into it?


I've created this df:

## creating multi index multi-level column dataframe
multi_ind = pd.MultiIndex.from_product([['Anna', 'Bob', 'Cara', 'Dan'],['2021', '2022']])
multi_cols = pd.MultiIndex.from_product([['Class XI', 'Class XII'],['Math' , 'Physics',     'Literature']])
data = np.random.randint(20,100, size = (8, 6))
df = pd.DataFrame(data, index =multi_ind, columns = multi_cols)
df.index.names = ['Student','Year']
df.columns.names = ['Class' , 'Subject']

df

I've sorted this df using sort_index()

df.sort_index(level=[0, 1], ascending=[1, 0], inplace=True)

Now i'm trying to slice into both index and columns to create a subset. I want to slice the outer index and the inner level column

idx = pd.IndexSlice
df.loc[idx['Bob':'Cara', '2021'], idx['Class XII' , 'Math' : 'Physics'] ]

I keep getting this error!
UnsortedIndexError: 'MultiIndex slicing requires the index to be lexsorted: slicing on levels [1], lexsort depth 1'

part of the error

I am trying to slice both the outer index and the inner column level while choosing a value/list of values for the inner index ad outer column level but it seems I haven't sorted the df correctly.

What am I doing wrong with the lex sort? How do I fix the UnsortedIndexError? I've already tried sort_index()

Also, what is lexsort? Any simple reading material on lexsort?

Thank you so much!!


Solution

  • If sorting Mulitindex in columns it working well, because your code sorting only MulitIndex in index:

    df.sort_index(axis=1, inplace=True)
    print (df)
    
    idx = pd.IndexSlice
    df = df.loc[idx['Bob':'Cara', '2021'], idx['Class XII' , 'Math' : 'Physics'] ]
    print (df)
    Class        Class XII        
    Subject           Math Physics
    Student Year                  
    Bob     2021        59      86
    Cara    2021        89      84