Search code examples
pandasmulti-index

Move column level to top in multi column index pandas DataFrame


What's a pythonic way to move a certain column level to the top in a pandas multi column index?

Toy example:

import numpy as np
import pandas as pd

cols = pd.MultiIndex.from_arrays(
    [
        ["a1", "a1", "a1", "a1"],
        ["b1", "b1", "b2", "b2"],
        ["x1", "x1", "x1", "x1"],
        ["c1", "c1", "c1", "c1"],
    ],
    names=(
        "unknown_level_name_0",
        "unknown_level_name_1",
        "known_level_name",
        "unknown_level_name_last",
    ),
)

df = pd.DataFrame(np.random.randint(0, 100, [5, 4]), columns=cols)

print(df)

unknown_level_name_0     a1            
unknown_level_name_1     b1      b2    
known_level_name         x1      x1    
unknown_level_name_last  c1  c1  c1  c1
0                        37  34  97  19
1                        54  47  53  46
2                        63  94  14  85
3                        16  51  27  96
4                        32  64  76  26

I am looking for the following result:

known_level_name         x1      x1    
unknown_level_name_0     a1            
unknown_level_name_1     b1      b2    
unknown_level_name_last  c1  c1  c1  c1
0                        37  34  97  19
1                        54  47  53  46
2                        63  94  14  85
3                        16  51  27  96
4                        32  64  76  26

EDIT:
There can be a variable number of levels. Most level names are unknown. However, there will always be one familiar level name (here: "known_level_name"). Using reorder_levels or swaplevel might become tricky if I don't know the exact position of "known_level_name".


Solution

  • Here is a generic function to move a column level (by label or index) to the top:

    def move_top(df, col, inplace=False):
        if col in df.columns.names:
            idx = df.columns.names.index(col)
        elif isinstance(col, int) and 0 < col < len(df.columns.names):
            idx = col
        else:
            raise IndexError(f'invalid index "{col}"')
        order = list(range(len(df.columns.names)))
        order.pop(idx)
        order = [idx]+order
        if inplace:
            df.columns = df.columns.reorder_levels(order=order)
        else:
            return df.reorder_levels(order, axis=1)
        
    move_top(df, 'known_level_name')
    

    output:

    known_level_name         x1            
    unknown_level_name_0     a1            
    unknown_level_name_1     b1      b2    
    unknown_level_name_last  c1  c1  c1  c1
    0                        33  30  23  77
    1                        10  73  80  33
    2                         7  54  52   9
    3                        71  99  22  22
    4                        83  15  86  40