Search code examples
pandasjoinmulti-index

Joining two pandas dataframes with multi-indexed columns


I want to join two pandas dataframes, one of which has multi-indexed columns.

This is how I make the first dataframe.

data_large = pd.DataFrame({"name":["a", "b", "c"], "sell":[10, 60, 50], "buy":[20, 30, 40]})
data_mini = pd.DataFrame({"name":["b", "c", "d"], "sell":[60, 20, 10], "buy":[30, 50, 40]})
data_topix = pd.DataFrame({"name":["a", "b", "c"], "sell":[10, 80, 0], "buy":[70, 30, 40]})

df_out = pd.concat([dfi.set_index('name') for dfi in [data_large, data_mini, data_topix]], 
                   keys=['Large', 'Mini', 'Topix'], axis=1)\
           .rename_axis(mapper=['name'], axis=0).rename_axis(mapper=['product','buy_sell'], axis=1)
df_out

enter image description here

And this is the second dataframe.

group = pd.DataFrame({"name":["a", "b", "c", "d"], "group":[1, 1, 2, 2]})
group

enter image description here

How can I join the second to the first, on the column name, keeping the multi-indexed columns?

This did not work and it flattened the multi-index.

df_final = df_out.merge(group, on=['name'], how='left')

Any help would be appreciated!

enter image description here


Solution

  • If need MultiIndex after merge is necessary convert column group to MultiIndex DataFrame, here is converted column name to index for merge by index, else both columns has to be converted to MultiIndex:

    group = group.set_index('name')
    group.columns = pd.MultiIndex.from_product([group.columns, ['new']])
    
    df_final = df_out.merge(group, on=['name'], how='left')
    

    Or:

    df_final = df_out.merge(group, left_index=True, right_index=True, how='left')
    

    print (df_final)
    product  Large        Mini       Topix       group
    buy_sell  sell   buy  sell   buy  sell   buy   new
    name                                              
    a         10.0  20.0   NaN   NaN  10.0  70.0     1
    b         60.0  30.0  60.0  30.0  80.0  30.0     1
    c         50.0  40.0  20.0  50.0   0.0  40.0     2
    d          NaN   NaN  10.0  40.0   NaN   NaN     2
    

    Another possible way, but with warning is convert values to MultiIndex after merge:

    df_final = df_out.merge(group, on=['name'], how='left')
    

    UserWarning: merging between different levels can give an unintended result (2 levels on the left, 1 on the right) warnings.warn(msg, UserWarning)


    L = [x if isinstance(x, tuple) else (x, 'new') for x in df_final.columns.tolist()]
    df_final.columns = pd.MultiIndex.from_tuples(L)   
    print (df_final)
      name Large        Mini       Topix       group
       new  sell   buy  sell   buy  sell   buy   new
    0    a  10.0  20.0   NaN   NaN  10.0  70.0     1
    1    b  60.0  30.0  60.0  30.0  80.0  30.0     1
    2    c  50.0  40.0  20.0  50.0   0.0  40.0     2
    3    d   NaN   NaN  10.0  40.0   NaN   NaN     2
    

    EDIT: If need group in MultiIndex:

    group = group.set_index(['name'])
    group.columns = pd.MultiIndex.from_product([group.columns, ['new']])
    
    df_final = (df_out.merge(group, on=['name'], how='left')
                      .set_index([('group','new')], append=True)
                      .rename_axis(['name','group']))
    print (df_final)
    product    Large        Mini       Topix      
    buy_sell    sell   buy  sell   buy  sell   buy
    name group                                    
    a    1      10.0  20.0   NaN   NaN  10.0  70.0
    b    1      60.0  30.0  60.0  30.0  80.0  30.0
    c    2      50.0  40.0  20.0  50.0   0.0  40.0
    d    2       NaN   NaN  10.0  40.0   NaN   NaN
    

    Or:

    df_final = df_out.merge(group, on=['name'], how='left').set_index(['name','group'])
    df_final.columns = pd.MultiIndex.from_tuples(df_final.columns)
    print (df_final)
               Large        Mini       Topix      
                sell   buy  sell   buy  sell   buy
    name group                                    
    a    1      10.0  20.0   NaN   NaN  10.0  70.0
    b    1      60.0  30.0  60.0  30.0  80.0  30.0
    c    2      50.0  40.0  20.0  50.0   0.0  40.0
    d    2       NaN   NaN  10.0  40.0   NaN   NaN