Search code examples
pythonexcelpandasmergexlsx

Python Pandas - Merge Multiple Dataframes


I have two .xlsx files that I need to merge using Pandas. The data frames are formatted as follows: Data Frame 1:

+-------+-------+-------+-------+-------+
| Index | Col_A | Col_B | Col_C | Col_Q | 
+-------+-------+-------+-------+-------+ 
|   1   |   A1  |   B1  |   C1  |   Q1  | 
|   2   |   A2  |   B2  |   C2  |   Q2  | 
|   3   |   A3  |   B3  |   C3  |   Q3  | 
|  ...  |  ...  |  ...  |  ...  |  ...  | 
|  100  |  A100 |  B100 |  C100 |  Q100 | 
+-------+-------+-------+-------+-------+

Data Frame 2:

+-------+--------+--------+--------+ 
| Index |  Col_X |  Col_Y |  Col_Z | 
+-------+--------+--------+--------+ 
|   1   | XData1 | YData1 | Part 1 | 
|   2   | XData2 | YData2 | Part 2 | 
|   3   | XData3 | YData3 | Part 3 | 
|  ...  |  ...   |  ...   |  ...   | 
|   N   | XDataN | YDataN | Part N | 
+-------+--------+--------+--------+

Col_Z in Data Frame 2 is a unique part number and N is a value less than 100. This part number will match with one and only one of the values in Col_A, Col_B, or Col_C of DF1. If a part number matches one of the values, I want to merge all of the data in that row of Data Frame 2 to the right of Data Frame 1. The final table should look something like this: Data Frame 3:

+-------+-------+-------+-------+-------+-------+--------+--------+--------+
| Index | Col_A | Col_B | Col_C | Col_Q | Index |  Col_A |  Col_B |  Col_Z |
+-------+-------+-------+-------+-------+-------+--------+--------+--------+
|   1   |  A1   |  B1   |   C1  |  Q1   |   X   | XDataX | YDataX | Part X | 
|   2   |  A2   |  B2   |   C2  |  Q2   |   Y   | XDataY | YDataY | Part Y | 
|   3   |  A3   |  B3   |   C3  |  Q3   |   Z   | XDataZ | YDataZ | Part Z | 
|  ...  |  ...  |  ...  |  ...  |  ...  |  ...  |  ...   |  ...   |  ...   | 
|  100  |  A100 | B100  |  C100 | Q100  |   N   | XDataN | YDataN | Part N | 
+-------+-------+-------+-------+-------+-------+--------+--------+--------+

I want to keep the index of DF1 and DF2 will be rearranged so that Col_Z matches with either Col_A, Col_B, or Col_C. What I tried was three separate merge commands where left_on was Col_A, Col_B, and Col_C respectively. The attribute right_on is Col_Z in all three cases. This would give me three new data frames for the instances where Col_A, Col_B, and Col_C matched the part number. When I try to merge these data frames to create a final data frame with all of the information I run into issues such as the data being shifted to the right. Any ideas on a solution?


Solution

  • if you have created your merged dataframes such as:

    dfa = df1.merge(df2, left_on = 'Col_A', right_on = 'Col_Z', how = 'left')
    

    Note the how = 'left' to keep all data from df1. And same idea for Col_B and Col_C, then you can do:

    df_output = dfa.fillna(dfb).fillna(dfc)
    

    You will fill the nan in dfa with the values from dfb then dfc if they exists and are not nan too.