Search code examples
pythonpandasdataframeconcatenationmulti-index

Create a multiIndex with pandas


I have a dataframe with two index like this:

Index1       Index2             200701     200702     200703      
alphas       Fourth Quartile    41.7421    41.1807     39.071           
             Third Quartile     74.1573    95.0195    90.6572          
             Second Quartile   -34.2001   -42.0068   -21.6236  
             First Quartile      39.293    37.3475    34.1704        
             All_Quartiles      37.6624    38.5957    38.0504        
betas        Fourth Quartile    18.1041    23.0865    33.7109       
             Third Quartile    -51.9743   -93.1191   -87.1772        
             Second Quartile    121.262    131.556    103.549        
             First Quartile     26.1859    28.5129    31.8663          
             All_Quartiles       24.511    23.1601    0.159067  

I need new index, something like this:

New_index  Index1     Index 2            200701     200702     200703      
Sector     alphas     Fourth Quartile    41.7421    41.1807     39.071              
                      Third Quartile     74.1573    95.0195    90.6572         
                      Second Quartile   -34.2001   -42.0068   -21.6236      
                      First Quartile      39.293    37.3475    34.1704        
                      All_Quartiles      37.6624    38.5957    38.0504     
           betas      Fourth Quartile    18.1041    23.0865    33.7109       
                      Third Quartile    -51.9743   -93.1191   -87.1772          
                      Second Quartile    121.262    131.556    103.549            
                      First Quartile     26.1859    28.5129    31.8663          
                      All_Quartiles       24.511    23.1601    0.159067     

I have many dataframes multindex belong to different sectors and I need to merger each one with a loop for.


Solution

  • You can manually recreate the entire MultiIndex, but that's a lot of writing. I prefer concat with the keys argument to add the additional level. The names argument allows us to give it a name.

    pd.concat([df], keys=['Sector'], names=['New_index']+df.index.names)
    

                                        200701    200702      200703
    New_index Index1 Index2                                         
    Sector    alphas Fourth Quartile   41.7421   41.1807   39.071000
                     Third Quartile    74.1573   95.0195   90.657200
                     Second Quartile  -34.2001  -42.0068  -21.623600
                     First Quartile    39.2930   37.3475   34.170400
                     All_Quartiles     37.6624   38.5957   38.050400
              betas  Fourth Quartile   18.1041   23.0865   33.710900
                     Third Quartile   -51.9743  -93.1191  -87.177200
                     Second Quartile  121.2620  131.5560  103.549000
                     First Quartile    26.1859   28.5129   31.866300
                     All_Quartiles     24.5110   23.1601    0.159067
    

    Here would be the same manually re-creating the MultiIndex.

    arrays = []
    
    arrays.append(pd.Index(['Sector']*len(df), name='New_Index')) # 0th level sector
    
    # Add all existing levels
    for i in range(df.index.nlevels):
        arrays.append(df.index.get_level_values(i))
    
    new_idx = pd.MultiIndex.from_arrays(arrays)
    
    df.index = new_idx
    

    The above is basically the internals of DataFrame.set_index(append=True) so you could clean it up a bit with that instead.

    df['New_index'] = 'Sector'                  # New column
    df = df.set_index('New_index', append=True) # Bring it to index
    df = df.reorder_levels([2, 0, 1])           # Move it to the front