Search code examples
pandasdataframeindexinginsertmulti-level

Insert a level o in the existing data frame such that 4 columns are grouped as one


I want to do multiindexing for my data frame such that MAE,MSE,RMSE,MPE are grouped together and given a new index level. Similarly the rest of the four should be grouped together in the same level but different name

> mux3 = pd.MultiIndex.from_product([list('ABCD'),list('1234')],
> names=['one','two'])###dummy data 
>     df3 = pd.DataFrame(np.random.choice(10, (3, len(mux))), columns=mux3) #### dummy data frame
>     print(df3) #intended output required for the data frame in the picture given below

sample dataframe


Solution

  • Assuming column groups are already in the appropriate order we can simply create an np.arange over the length of the columns and floor divide by 4 to get groups and create a simple MultiIndex.from_arrays.

    Sample Input and Output:

    import numpy as np
    import pandas as pd
    
    initial_index = [1, 2, 3, 4] * 3
    np.random.seed(5)
    df3 = pd.DataFrame(
        np.random.choice(10, (3, len(initial_index))), columns=initial_index
    )
    
       1  2  3  4  1  2  3  4  1  2  3  4  # Column headers are in repeating order
    0  3  6  6  0  9  8  4  7  0  0  7  1
    1  5  7  0  1  4  6  2  9  9  9  9  1
    2  2  7  0  5  0  0  4  4  9  3  2  4
    
    # Create New Columns
    df3.columns = pd.MultiIndex.from_arrays([
        np.arange(len(df3.columns)) // 4,  # Group Each set of 4 columns together
        df3.columns  # Keep level 1 the same as current columns
    ], names=['one', 'two'])  # Set Names (optional)
    df3
    
    one  0           1           2         
    two  1  2  3  4  1  2  3  4  1  2  3  4
    0    3  6  6  0  9  8  4  7  0  0  7  1
    1    5  7  0  1  4  6  2  9  9  9  9  1
    2    2  7  0  5  0  0  4  4  9  3  2  4
    

    If columns are in mixed order:

    np.random.seed(5)
    df3 = pd.DataFrame(
        np.random.choice(10, (3, 8)), columns=[1, 1, 3, 2, 4, 3, 2, 4]
    )
    df3
    
       1  1  3  2  4  3  2  4  # Cannot select groups positionally
    0  3  6  6  0  9  8  4  7
    1  0  0  7  1  5  7  0  1
    2  4  6  2  9  9  9  9  1
    

    We can convert Index.to_series then enumerate columns using groupby cumcount then sort_index if needed to get in order:

    df3.columns = pd.MultiIndex.from_arrays([
        # Enumerate Groups to create new level 0 index
        df3.columns.to_series().groupby(df3.columns).cumcount(),
        df3.columns
    ], names=['one', 'two'])  # Set Names (optional)
    # Sort to Order Correctly
    # (Do not sort before setting columns it will break alignment with data)
    df3 = df3.sort_index(axis=1)
    df3
    
    one  0           1         
    two  1  2  3  4  1  2  3  4  # Notice Data has moved with headers
    0    3  0  6  9  6  4  8  7
    1    0  1  7  5  0  0  7  1
    2    4  9  2  9  6  9  9  1