Search code examples
pythonpandasindexingrenamemulti-index

Renaming multiindex row from a look up dictionary with Pandas


Given a multilevel row as shown in current presentation, I would like to rename the first level index based on information stored in a look up dictionary.

Presently, I have the idea of transpose the df and looping to the now column name. Thereafter, the suitable new columns name will be checked and rename based on the information stored in the dictionary (i.e.,str_dic).

However, I wonder whether there is more straight forward of doing this?

The drafted code is as below

import pandas as pd

def create_df (idx):
    df = pd.DataFrame ( {'A': [11, 21, 31],
                         'B': [12, 22, 32],
                         'C': [13, 23, 33]},
                        index=['ONE', 'TWO', 'THREE'] )

    df.columns = pd.MultiIndex.from_product ( [['level1'], ['level2'], df.columns] )
    df = df.set_index ( [[f'idx_{idx}'] * len ( df )], append=True ).swaplevel ( 0 )
    # df = df.set_index ( [['temp_general'] * len ( df )], append=True ).swaplevel ( 0 )
    return df

#look up dict
str_g = ['idx_0', 'idx_1', 'idx_2', 'idx_3','idx_4']
str_h = ['E', 'b', 'c', 'd','e']
str_dic = {str_g [i]: str_h [i] for i in range ( len ( str_g ) )}

# create the df
all_df = [create_df ( idx ) for idx in range ( 0, len(str_h)-1)] # Delibarately minus one for worse case scenario
df = pd.concat ( all_df, axis=0 )

df=df.T
all_ls=list(df.columns.values.tolist())
for xss in all_ls:
    df=df.rename ( columns={xss: str_dic[xss[0]]}, level=1) #WIP

Current presentation

level1        
            level2        
                 A   B   C
idx_0 ONE       11  12  13
      TWO       21  22  23
      THREE     31  32  33
idx_1 ONE       11  12  13
      TWO       21  22  23
      THREE     31  32  33
idx_2 ONE       11  12  13
      TWO       21  22  23
      THREE     31  32  33
idx_3 ONE       11  12  13
      TWO       21  22  23
      THREE     31  32  33

Expected Output

            level1        
            level2        
                 A   B   C
a     ONE       11  12  13
      TWO       21  22  23
      THREE     31  32  33
b     ONE       11  12  13
      TWO       21  22  23
      THREE     31  32  33
c     ONE       11  12  13
      TWO       21  22  23
      THREE     31  32  33
d     ONE       11  12  13
      TWO       21  22  23
      THREE     31  32  33

#######################

Bonus: Based on suggestion by Anurag Dabas, how can we append another level on the highest level based on another look up dict.

Naively, I would proposed something like

df.index =  pd.MultiIndex.from_product ( [s1, df.index.get_level_values ( 0 ).unique (),
                                          df.index.get_level_values ( 1 ).unique ()] )

But it return an error

ValueError: Length mismatch: Expected axis has 12 elements, new values have 48 elements

The failed proposed amended based on Anurag:

str_global=['typ1','typ1','typ2','typ2','typ3']
global_dic = {str_h [i]: str_global [i] for i in range ( len ( str_h ) )}
s1 = [global_dic.get ( x ) for x in df.index.get_level_values ( 0 ).unique ()]

df.index =  pd.MultiIndex.from_product ( [s1, df.index.get_level_values ( 0 ).unique (),
                                          df.index.get_level_values ( 1 ).unique ()] )

Expected output

                    A   B   C
typ1    E     ONE       11  12  13
              TWO       21  22  23
              THREE     31  32  33
typ1    b     ONE       11  12  13
              TWO       21  22  23
              THREE     31  32  33
 typ2   c     ONE       11  12  13
              TWO       21  22  23
              THREE     31  32  33
 typ2   d     ONE       11  12  13
              TWO       21  22  23
              THREE     31  32  33

Solution

  • let's try replacing keys of str_dic in 'level 0' index with their values by using list comprehension then generate a MultiIndex by using pd.MultiIndex.from_product() and set that equal to the index of your dataframe by using index attribute::

    s=[str_dic.get(x) for x in df.index.get_level_values(0).unique()]
    df.index=pd.MultiIndex.from_product([s,df.index.get_level_values(1).unique()])
    

    Update:

    Since Now you have 4 unique level0 value and 3unique level0 index and 4 value in s1 so pd.MultiIndex.from_product() create 48 pairs of MultiIndex so it is not useful in that case so use:

    s1=[global_dic.get(x) for x in df.index.get_level_values(0)]
    df=df.set_index(pd.Series(s1).values,append=True)
    df.index=df.index.reorder_levels([2,0,1])
    

    output of df:

                    level1        
                    level2        
                     A   B   C
    E     ONE       11  12  13
          TWO       21  22  23
          THREE     31  32  33
    b     ONE       11  12  13
          TWO       21  22  23
          THREE     31  32  33
    c     ONE       11  12  13
          TWO       21  22  23
          THREE     31  32  33
    d     ONE       11  12  13
          TWO       21  22  23
          THREE     31  32  33