I have two .xlsx files that I need to merge using Pandas. The data frames are formatted as follows: Data Frame 1:
+-------+-------+-------+-------+-------+
| Index | Col_A | Col_B | Col_C | Col_Q |
+-------+-------+-------+-------+-------+
| 1 | A1 | B1 | C1 | Q1 |
| 2 | A2 | B2 | C2 | Q2 |
| 3 | A3 | B3 | C3 | Q3 |
| ... | ... | ... | ... | ... |
| 100 | A100 | B100 | C100 | Q100 |
+-------+-------+-------+-------+-------+
Data Frame 2:
+-------+--------+--------+--------+
| Index | Col_X | Col_Y | Col_Z |
+-------+--------+--------+--------+
| 1 | XData1 | YData1 | Part 1 |
| 2 | XData2 | YData2 | Part 2 |
| 3 | XData3 | YData3 | Part 3 |
| ... | ... | ... | ... |
| N | XDataN | YDataN | Part N |
+-------+--------+--------+--------+
Col_Z in Data Frame 2 is a unique part number and N is a value less than 100. This part number will match with one and only one of the values in Col_A, Col_B, or Col_C of DF1. If a part number matches one of the values, I want to merge all of the data in that row of Data Frame 2 to the right of Data Frame 1. The final table should look something like this: Data Frame 3:
+-------+-------+-------+-------+-------+-------+--------+--------+--------+
| Index | Col_A | Col_B | Col_C | Col_Q | Index | Col_A | Col_B | Col_Z |
+-------+-------+-------+-------+-------+-------+--------+--------+--------+
| 1 | A1 | B1 | C1 | Q1 | X | XDataX | YDataX | Part X |
| 2 | A2 | B2 | C2 | Q2 | Y | XDataY | YDataY | Part Y |
| 3 | A3 | B3 | C3 | Q3 | Z | XDataZ | YDataZ | Part Z |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 100 | A100 | B100 | C100 | Q100 | N | XDataN | YDataN | Part N |
+-------+-------+-------+-------+-------+-------+--------+--------+--------+
I want to keep the index of DF1 and DF2 will be rearranged so that Col_Z matches with either Col_A, Col_B, or Col_C. What I tried was three separate merge commands where left_on was Col_A, Col_B, and Col_C respectively. The attribute right_on is Col_Z in all three cases. This would give me three new data frames for the instances where Col_A, Col_B, and Col_C matched the part number. When I try to merge these data frames to create a final data frame with all of the information I run into issues such as the data being shifted to the right. Any ideas on a solution?
if you have created your merged dataframes such as:
dfa = df1.merge(df2, left_on = 'Col_A', right_on = 'Col_Z', how = 'left')
Note the how = 'left'
to keep all data from df1
. And same idea for Col_B and Col_C, then you can do:
df_output = dfa.fillna(dfb).fillna(dfc)
You will fill the nan
in dfa
with the values from dfb
then dfc
if they exists and are not nan
too.