Search code examples
pythonpandasmulti-level

pandas 0.20: How do I do a custom sort of the columns with multi-level indexes?


New to pandas. Using pandas 0.20, so no CategoricalDtype. I want to do a custom sort of the columns after merging a few df with concat. The merged df will have multi-level indexed columns.

Using Categorical, it doesn't work for custom sort.

dfs=[table2, table3,table4]
L_list=['A', 'B', 'C']     
test=pd.Categorical(L_list, categories=['B', 'C','A'], ordered=True)

merged_df = pd.concat(

    dfs, axis=1,

    keys=pd.MultiIndex.from_arrays([

        test,       # Top Level Keys

        ['Cat1', 'Cat2', 'Cat1']  # Second Level Keys

    ], names=['Importance', 'Category'])

)

output=merged_df.sort_index(axis=1, level=[0])

Current State

**Merged_df**

Importance| A         |  B   | C       |
Category | Cat1       | Cat2 | Cat1    |
         |Total Assets| AUMs | Revenue |
Firm 1   | 100        | 300  | 300     |
Firm 2   | 200        | 3400 | 200     |
Firm 3   | 300        | 800  | 400     |
Firm 4   | NaN        | 800  | 350     |


Desired State

**Merged_df**

Importance|  B   | C       | A            |
Category  | Cat2 | Cat1    | Cat1         |
         |AUMs | Revenue | Total Assets |
Firm 1   | 300  | 300     | 100          |  
Firm 2   | 3400 | 200     | 200          |  
Firm 3   |  800  | 400     | 300          |  
Firm 4   |  800  | 350     | NaN          |  


Solution

  • not sure of all possibilities with 0.20, but an idea is to convert the multiindex column to a frame, change each level to categorical data (like you did with test in your question), then sort_values the dataframe, keep the index that is columns sorted as you wanted to rearange merged_df columns. see with this example:

    # simple example
    dfs=[
        pd.DataFrame({'a':[0,1]}, index=[0,1]), 
        pd.DataFrame({'b':[0,1]}, index=[0,1]), 
        pd.DataFrame({'c':[0,1]}, index=[0,1]),
    ]
    
    L_list=['A', 'B', 'B'] # changed C to have 2 B with 2 Cat  
    
    merged_df = pd.concat(
        dfs, axis=1,
        keys=pd.MultiIndex.from_arrays([
            test,       # Top Level Keys
            ['Cat1', 'Cat1', 'Cat2']  # Second Level Keys
        ], names=['Importance', 'Category'])
    )
    print(merged_df)
    #      A    B     
    #   Cat1 Cat1 Cat2
    #      a    b    c
    # 0    0    0    0
    # 1    1    1    1
    

    so you can do

    # create dataframes of columns names
    col_df = merged_df.columns.to_frame()
    
    # change to catagorical each level you want
    col_df[0] = pd.Categorical(col_df[0], categories=['B', 'C','A'], ordered=True)
    col_df[1] = pd.Categorical(col_df[1], categories=['Cat2', 'Cat1'], ordered=True)
    
    # sort values and get the index
    print(col_df.sort_values(by=col_df.columns.tolist()).index)
    # MultiIndex([('B', 'Cat2', 'c'), # B is before A and Cat2 before Cat1
    #             ('B', 'Cat1', 'b'),
    #             ('A', 'Cat1', 'a')],
    #            )
    output = merged_df[col_df.sort_values(by=col_df.columns.tolist()).index]
    print(output)  
    #      B         A
    #   Cat2 Cat1 Cat1
    #      c    b    a
    # 0    0    0    0
    # 1    1    1    1