Search code examples
pythonpandassortingmulti-index

How to use two key functions when sorting a MultiIndex dataframe?


In this call to df.sort_index() on a MultiIndex dataframe, how to use func_2 for level two?

func_1 = lambda s: s.str.lower()
func_2 = lambda x: np.abs(x)
m_sorted = df_multi.sort_index(level=['one', 'two'], key=func_1)

The documentation says "For MultiIndex inputs, the key is applied per level", which is ambiguous.


import pandas as pd
import numpy as np
np.random.seed(3)

# Create multiIndex
choice = lambda a, n: np.random.choice(a, n, replace=True)
df_multi = pd.DataFrame({
    'one': pd.Series(choice(['a', 'B', 'c'], 8)),
    'two': pd.Series(choice([1, -2, 3], 8)),
    'A': pd.Series(choice([2,6,9,7] ,8))
    })
df_multi = df_multi.set_index(['one', 'two'])

# Sort MultiIndex
func_1 = lambda s: s.str.lower()
func_2 = lambda x: np.abs(x)
m_sorted = df_multi.sort_index(level=['one'], key=func_1)

Solution

  • sort_index takes a unique function as key that would be used for all levels.

    That said, you could use a wrapper function to map the desired sorting function per level name:

    def sorter(level, default=lambda x: x):
        return {
          'one': lambda s: s.str.lower(),
          'two': np.abs,
        }.get(level.name, default)(level)
    
    df_multi.sort_index(level=['one', 'two'], key=sorter)
    

    NB. in case of no match a default function is used that returns the level unchanged.

    Another option with numpy.lexsort instead of sort_index:

    # levels, functions in desired sorting order
    sorters = [('one', lambda s: s.str.lower()), ('two', np.abs)]
    
    out = df_multi.iloc[np.lexsort([f(df_multi.index.get_level_values(lvl))
                                    for lvl, f in sorters[::-1]])]
    

    lexsort uses the major keys last, thus the [::-1]

    Output:

             A
    one two   
    a    1   6
        -2   2
         3   7
    B    1   6
        -2   7
        -2   7
         3   2
         3   6