Say I have a dataframe with a MultiIndex like this:
import pandas as pd
import numpy as np
my_index = pd.MultiIndex.from_product(
[(3,1,2), ("small", "tall", "medium"), ("B", "A", "C")],
names=["number", "size", "letter"]
)
df_0 = pd.DataFrame(np.random.rand(27, 2), columns=["x", "y"], index=my_index)
x y
number size letter
3 small B 0.950073 0.599918
A 0.014450 0.472736
C 0.208064 0.778538
tall B 0.979631 0.367234
A 0.832459 0.449875
C 0.761929 0.053144
medium B 0.460764 0.800131
A 0.355746 0.573813
C 0.078924 0.058865
1 small B 0.405209 0.354636
A 0.536242 0.012904
C 0.458910 0.723627
tall B 0.859898 0.442954
A 0.109729 0.885598
C 0.378363 0.220695
medium B 0.652191 0.685181
A 0.503525 0.400973
C 0.454671 0.188798
2 small B 0.407654 0.168782
A 0.393451 0.083023
C 0.073432 0.165209
tall B 0.678226 0.108497
A 0.718348 0.077935
C 0.595500 0.146271
medium B 0.719985 0.422167
A 0.950950 0.532390
C 0.687721 0.920229
Now I want to sort the index by the different levels, first number, then size, and finally letter.
If I do this...
df_1 = df_0.sort_index(level=["number", "size", "letter"], inplace=False)
... the size of course gets sorted in alphabetical order.
x y
number size letter
1 medium A 0.503525 0.400973
B 0.652191 0.685181
C 0.454671 0.188798
small A 0.536242 0.012904
B 0.405209 0.354636
C 0.458910 0.723627
tall A 0.109729 0.885598
B 0.859898 0.442954
C 0.378363 0.220695
2 medium A 0.950950 0.532390
B 0.719985 0.422167
C 0.687721 0.920229
small A 0.393451 0.083023
B 0.407654 0.168782
C 0.073432 0.165209
tall A 0.718348 0.077935
B 0.678226 0.108497
C 0.595500 0.146271
3 medium A 0.355746 0.573813
B 0.460764 0.800131
C 0.078924 0.058865
small A 0.014450 0.472736
B 0.950073 0.599918
C 0.208064 0.778538
tall A 0.832459 0.449875
B 0.979631 0.367234
C 0.761929 0.053144
But I want it to be sorted by a custom key. I know I can sort the size level with a custom sort function like this:
custom_key = np.vectorize(lambda x: {"small": 0, "medium": 1, "tall": 2}[x])
df_2 = df_0.sort_index(level=1, key=custom_key, inplace=False)
x y
number size letter
1 small A 0.536242 0.012904
B 0.405209 0.354636
C 0.458910 0.723627
2 small A 0.393451 0.083023
B 0.407654 0.168782
C 0.073432 0.165209
3 small A 0.014450 0.472736
B 0.950073 0.599918
C 0.208064 0.778538
1 medium A 0.503525 0.400973
B 0.652191 0.685181
C 0.454671 0.188798
2 medium A 0.950950 0.532390
B 0.719985 0.422167
C 0.687721 0.920229
3 medium A 0.355746 0.573813
B 0.460764 0.800131
C 0.078924 0.058865
1 tall A 0.109729 0.885598
B 0.859898 0.442954
C 0.378363 0.220695
2 tall A 0.718348 0.077935
B 0.678226 0.108497
C 0.595500 0.146271
3 tall A 0.832459 0.449875
B 0.979631 0.367234
C 0.761929 0.053144
But how can I sort by all levels like for df_1 and use the custom key on the second level? Expected output:
x y
number size letter
1 small A 0.536242 0.012904
B 0.405209 0.354636
C 0.458910 0.723627
medium A 0.503525 0.400973
B 0.652191 0.685181
C 0.454671 0.188798
tall A 0.109729 0.885598
B 0.859898 0.442954
C 0.378363 0.220695
2 small A 0.393451 0.083023
B 0.407654 0.168782
C 0.073432 0.165209
medium A 0.950950 0.532390
B 0.719985 0.422167
C 0.687721 0.920229
tall A 0.718348 0.077935
B 0.678226 0.108497
C 0.595500 0.146271
3 small A 0.014450 0.472736
B 0.950073 0.599918
C 0.208064 0.778538
medium A 0.355746 0.573813
B 0.460764 0.800131
C 0.078924 0.058865
tall A 0.832459 0.449875
B 0.979631 0.367234
C 0.761929 0.053144
And how should I define the custom key function, so that I also can access the level in sort_index by name like this?
df_3 = df_0.sort_index(level="size", key=custom_key, inplace=False)
Here, it gives a KeyError: 'Level size not found'
The ideal would be to use ordered Categorical
data.
Else, you can use a custom mapper based on the level name:
# define here custom sorters
# all other levels will be sorted by default order
order = {'size': ['small', 'medium', 'tall']}
def sorter(s):
if s.name in order:
return s.map({k:v for v,k in enumerate(order[s.name])})
return s
out = df_0.sort_index(level=["number", "size", "letter"], key=sorter)
Output:
x y
number size letter
1 small A 0.530753 0.687982
B 0.722848 0.974920
C 0.174058 0.695016
medium A 0.397016 0.550404
B 0.426989 0.843007
C 0.929218 0.497728
tall A 0.159078 0.005675
B 0.917871 0.384265
C 0.685435 0.585242
2 small A 0.423254 0.838356
B 0.342158 0.209632
...