Regarding this df:
Amount type
Month_year 2019-06-01 2019-07-01 2019-06-01 2019-07-01
TYPE_ID 1 2 1 2 1 2 1 2
ID
100 20 10 40 20 1 1 2 1
200 80 60 30 10 2 2 1 1
The following code:
df = df.sort_index(axis=1, level=[1,2])
produces this:
Amount type Amount ... type Amount type
Month_year 2019-06-01 2019-06-01 2019-06-01 ... 2019-07-01 2019-07-01 2019-07-01
TYPE_ID 1 1 2 ... 1 2 2
ID ...
100 20 1 10 ... 2 20 1
200 80 2 60 ... 1 10 1
I really don't understand what's going on. I've read the docs but there are no examples and the description is really obscure.
Could anyone explain to me how this method works and how I received this result?
Essentially,
sort_index
with axis=1
sorts the column headers, and this ordering is then used to set the order of the columns.
And, the corollary,
sort_index
with axis=0
sorts the index, and this ordering is then used to set the order of the rows.
This is what your input df
looks like:
The top three "rows" in the picture above correspond to a pandas MultiIndex column of df
, which looks like this:
df.columns
MultiIndex([('Amount', '2019-06-01', 1),
('Amount', '2019-06-01', 2),
('Amount', '2019-07-01', 1),
('Amount', '2019-07-01', 2),
( 'type', '2019-06-01', 1),
( 'type', '2019-06-01', 2),
( 'type', '2019-07-01', 1),
( 'type', '2019-07-01', 2)])
Let's pretend your 3-level multiIndex column is magically converted into a DataFrame which each level its own column called cdf
:
cdf
level_0 level_1 level_2
(1) Amount 2019-06-01 1
(2) Amount 2019-06-01 2
(3) Amount 2019-07-01 1
(4) Amount 2019-07-01 2
(5) type 2019-06-01 1
(6) type 2019-06-01 2
(7) type 2019-07-01 1
(8) type 2019-07-01 2
The row numbers here correspond to the column identifiers in the original DataFrame. Let's see what happens when we sort cdf
by the last two columns:
cdf.sort_values(['level_1', 'level_2'])
level_0 level_1 level_2
(1) Amount 2019-06-01 1
(5) type 2019-06-01 1
(2) Amount 2019-06-01 2
(6) type 2019-06-01 2
(3) Amount 2019-07-01 1
(7) type 2019-07-01 1
(4) Amount 2019-07-01 2
(8) type 2019-07-01 2
Notice the index of the sorted cdf
:
(1) (5) (2) (6) (3) (7) (4) (8)
Now let's see what happens when we apply the sort_index
operation to df
:
df.sort_index(level=[1, 2], axis=1)
The ellipses in the center indicate that not all columns can be displayed due to the width of the terminal (in fact, columns (6) and (3) are not displayed but they are there very much there), but that's not the interesting part. Contrast the column ordering here with the row ordering of sorted cdf
, you'll see these are are identical.