Search code examples
pythonpython-3.xpandasdataframemulti-index

Combine multiple dataframes into dataframe of multi-index columns


Suppose I have 3 dataframes,

df_1 representing revenues

Year        TSLA    MSFT     AVY
2019         851     200   112.8
2018         725     150    92.6

df_2 representing some other revenues

Year        TSLA    MSFT     AVY
2019          10      13      17
2018          11      14      18

and df_3 representing expenses

Year        TSLA    MSFT     AVY
2019         110     213     317
2018         111     214     418

what might the code be to obtain the following df?

       TSLA                             MSFT                             AVY
Year   revenues other_revenues expenses revenues other_revenues expenses revenues other_revenues expenses
2019        851             10      110      200             13      213    112.8             17      317
2018        725             11      111      150             14      214     92.6             18      418

where the columns are in multi-index form?

Thanks


Solution

  • Use concat with DataFrame.swaplevel and DataFrame.sort_index for sorting MultiIndex:

    #if not Year is index first create it
    L = [x.set_index('Year') for x in [df_1, df_2, df_3]]
    df = (pd.concat(L, 
                   axis=1, 
                   keys=('revenues', 'other_revenues', 'expenses'))
            .swaplevel(1, 0, axis=1)
            .sort_index(axis=1))
    print (df)
             AVY                            MSFT                            TSLA  \
         revenues expenses other_revenues revenues expenses other_revenues revenues   
    Year                                                                           
    2019   112.8      317             17     200      213             13     851   
    2018    92.6      418             18     150      214             14     725   
    
                                  
         expenses other_revenues  
    Year                          
    2019      110             10  
    2018      111             11  
    

    EDIT: For order like in original add DataFrame.reindex by MultiIndex.from_product from unique values of first level of MultiIndex:

    sub = ['revenues', 'other_revenues', 'expenses']
    L = [x.set_index('Year') for x in [df_1, df_2, df_3]]
    df = (pd.concat(L, 
                   axis=1, 
                   keys=sub)
            .swaplevel(1, 0, axis=1))
    
    mux = pd.MultiIndex.from_product([df.columns.levels[0], sub])
    df = df.reindex(mux, axis=1)
            
    print (df)
             TSLA                             MSFT                          \
         revenues other_revenues expenses revenues other_revenues expenses   
    Year                                                                     
    2019      851             10      110      200             13      213   
    2018      725             11      111      150             14      214   
    
              AVY                          
         revenues other_revenues expenses  
    Year                                   
    2019    112.8             17      317  
    2018     92.6             18      418