Search code examples
pythonpandasmulti-index

Reorder levels in pandas from_product-MultiIndex DataFrame, coupled with values?


The following situation currently appears quite often in my current work. I have a pandas DataFrame with a product-MultiIndex looking like this:

cols = pd.MultiIndex.from_product([['foo1', 'foo2'], ['bar1', 'bar2']], names=['foo', 'bar'])
df = pd.DataFrame(np.arange(5*4).reshape(5, 4), index=range(5), columns=cols)
df

foo foo1      foo2     
bar bar1 bar2 bar1 bar2
0      0    1    2    3
1      4    5    6    7
2      8    9   10   11
3     12   13   14   15
4     16   17   18   19

Now I want to swap the column levels of the DataFrame, so I tried this:

df.reorder_levels(['bar', 'foo'], axis=1)

bar bar1 bar2 bar1 bar2
foo foo1 foo1 foo2 foo2
0      0    1    2    3
1      4    5    6    7
2      8    9   10   11
3     12   13   14   15
4     16   17   18   19

But this is not what I want. I want to the order of the columns to change according to this nice canonical product-ordering. My current workaround looks like this:

cols_swapped = pd.MultiIndex.from_product([['bar1', 'bar2'], ['foo1', 'foo2']], names=['bar', 'foo'])
df.reorder_levels(cols_swapped.names, axis=1).loc[:, cols_swapped])

bar bar1      bar2     
foo foo1 foo2 foo1 foo2
0      0    2    1    3
1      4    6    5    7
2      8   10    9   11
3     12   14   13   15
4     16   18   17   19

This works, but is not so nice, e.g. because it is more confusing and a new MultiIndex has to be created. The situation, in which this often occurs to me, is that I compute a new feature for all of my columns. But after concatenating it to my df, it want to "sort" the corresponding new level into a new position. Say the new feature sits in level 0, then the workaround looks like this:

new_order = [1, 2, 0, 3, 4]
cols_swapped = pd.MultiIndex.from_product(
    [df.columns.levels[i] for i in new_order],
    names = [df.columns.names[i] for i in new_order]
)
df_swap = df.reorder_levels(cols_swapped.names, axis=1).loc[:, cols_swapped]

which is even less nice.

Is this supported in pandas? If yes, what would be a more elegant way to do it?


Solution

  • I believe need swaplevel with sort_index:

    df = df.swaplevel(0,1, axis=1).sort_index(axis=1)
    print (df)
    bar bar1      bar2     
    foo foo1 foo2 foo1 foo2
    0      0    2    1    3
    1      4    6    5    7
    2      8   10    9   11
    3     12   14   13   15
    4     16   18   17   19