I'm new to Pandas (0.16.1), and want custom sort in multiindex so i use Categoricals. Part of my multiindex:
Part Defect Own
Кузов 504 ИП
Кузов 504 Итого
Кузов 504 ПС
Кузов 505 ПС
Кузов 506 ПС
Кузов 507 ПС
Кузов 530 ИП
Кузов 530 Итого
Кузов 530 ПС
I create pivot table with MultiIndex levels [Defect, Own]. Then i make "Own" Categorical (see p.s. part of question) to sort it as [ИП, ПС, Итого]. But when i prepend levels with "Part", which is also Categorical based on "Defect" level, and sort index with
pvt.sortlevel(0, inplace=True)
"Own" level is sorted in alphabetical order: [ИП, Итого, ПС]. How can i custom-sort two levels in multiindex?
P. S. I convert "Own" level to Categorical with the following code: create new column, replace index level with it. Is it ok?
def makeLevelCategorical(pdf, pname, cats):
names = pdf.index.names
namei = names.index(pname)
pdf["tmp"] = pd.Categorical(pdf.index.get_level_values(pname), categories=cats) #New temp column
pdf.set_index("tmp", append=True, inplace=True) #Append column to index
pdf = pdf.reset_index(pname, drop=True) #Remove /pname/ level
names2 = list(names)
names2[namei] = "tmp"
pdf.reorder_levels(names2) #Put "tmp" level to /pname/'s position
pdf.index.names = names #Rename "tmp" level to /pname/
return pdf
Sorting a multiindex can be done using the Dataframe.sort_index function.
Here is a small example:
df = pd.DataFrame(
{"i1":[1,1,1,1,2,4,4,2,3,3,3,3],
"i2":[1,3,2,2,1,1,2,2,1,1,3,2],
"d1":['a','b','c','d','e','f','g','h','i','j','k','l']}
)
df.set_index(['i1', 'i2'], inplace=True)
df.sort_index()
Outputs:
d1
i1 i2
1 1 a
2 c
2 d
3 b
2 1 e
2 h
3 1 i
1 j
2 l
3 k
4 1 f
2 g
If you want to change the sort order on column basis, the Dataframe.sort_index function takes an argument ascending=
which can be given a list of [True, False]
statements corresponding to the columns in order.
Categorical is a new shiny dtype in pandas and it should be used, but it is not needed for this operation per se.
Edit due to comment:
Sort will always sort alphabetically or in reverse order. If you want custom sort, then you need to create a new column which can be sorted alphabetically but is a result of the column which can determine the sorting. Do this using Series.map, like this example, that sorts the datasets with vowels first:
mappings = {'a': 0, 'b':1, 'c':1, 'd':1,
'e':0, 'f':1, 'g':1, 'h':1,
'i':0, 'j':1, 'k': 1, 'l': 1}
df['sortby'] = df['d1'].map(mappings)
df.sort('sortby')
d1 sortby
i1 i2
1 1 a 0
2 1 e 0
3 1 i 0
1 3 b 1
2 c 1
2 d 1
4 1 f 1
2 g 1
2 2 h 1
3 1 j 1
3 k 1
2 l 1
If you do not want the sortby column after that, you can simply delete it, like this:
del df['sortby']