Search code examples
pythonpandasmergemulti-index

Merge two dataframe when one has multiIndex in pandas


I have MultiIndex dataframe (table1) and I want to merge specific columns from another dataframe that is not multiIndex (table 2).

Example of table 1:

>>>            name          2020-10-21                   2020-10-22   ...
Column                     9    10    11   12          9    10    11    12   

0             A5          2.1   2.2   2.4  2.8        5.4   3.4   1.1   7.3
1             B9          7.2   1.2   14.5  7.5       3.4   5.2   6.4   8.1
2             C3          1.1   6.5   8.4   9.1       1.1   4.3   6.5   8.7
...

Example of table 2:

>>>name   indc   control   code       
0    A5   0.32    yes       1
1    C3   0.11    no        2
2    B18  0.23    yes       2
3    B9   0.45    no        3

I want to merge the column "code" based on key "name" from table 2 (and "index" from table 1) to get the code beside te name:

>>>          index              2020-10-21                   2020-10-22   ...
Column                code     9    10    11   12          9    10    11    12   

0             A5        1     2.1   2.2   2.4  2.8        5.4   3.4   1.1   7.3
1             B9        3    7.2   1.2   14.5  7.5       3.4   5.2   6.4   8.1
2             C3        2     1.1   6.5   8.4   9.1       1.1   4.3   6.5   8.7
...

I know how to merge when the index is not multindex level, then I do so something like this:

df = table1.merge(table2[['code','name']], how = 'left',
                left_on = 'index', right_on = 'name')

but now I get error:

UserWarning: merging between different levels can give an unintended result (2 levels on the left,1 on the right) warnings.warn(msg, UserWarning)

and then:

ValueError: 'index' is not in list

when I print the columns I can see that thy are like tuples but I don't know why it says the index is not in list as when I print the oclumns of table 1 I get:

Index([   ('index', ''),  (2020-10-22, 9)...

so i'm a bit confused.

My end goal: to merge the code column based on the columns "name" and "index"


Solution

  • For correct working need MultiIndex in both DataFrames:

    df2 = table2[['code','name']].rename(columns={'name':'index'})
    df2.columns = pd.MultiIndex.from_product([df2.columns, ['']])
    
    
    df = table1.merge(df2, how = 'left', on = [('index', '')])
    
    #if necessary reorder columns names
    cols = df.columns[:1].tolist() + df.columns[-1:].tolist() + df.columns[1:-1].tolist()
    
    df = df[cols]
    print (df)
      index code 2020-10-21                 2020-10-22               
                          9   10    11   12          9   10   11   12
    0    A5    1        2.1  2.2   2.4  2.8        5.4  3.4  1.1  7.3
    1    B9    3        7.2  1.2  14.5  7.5        3.4  5.2  6.4  8.1
    2    C3    2        1.1  6.5   8.4  9.1        1.1  4.3  6.5  8.7