Read this older post updated recently How to merge dataframes based on an "OR" condition. In that example there were 2 possible columns to merge on, in this case there are three A, B and C, with C being a string.
The two df could match on any combination of A, B OR C. And each produces unique merge records on their own. Using the above example as a guide without luck.
Appreciate any help!
DF2
A | B | C | D |
---|---|---|---|
123 | 321 | abc | company1 |
456 | rss | company2 | |
321 | 432 | xyz | company3 |
klm | company4 |
DF1
A | B | C | F | G |
---|---|---|---|---|
123 | abc | active | 650 | |
456 | active | 754 | ||
321 | 432 | active | 820 | |
klm | active | 840 |
Working Code
suff_A = ['_on_A_match_1', '_on_A_match_2', '_on_A_Match_3']
suff_B = ['_on_B_match_1', '_on_B_match_2', '_on_B_match_3']
suff_C = ['_on_C_match_1', '_on_C_match_2', '_on_C_match_3']
Get an error message on this piece too many values to unpack
DF3 = pd.concat([DF1.merge(DF2, on='A', suffixes=suff_A),
DF1.merge(DF2, on='B', suffixes=suff_B),
DF1.merge(DF2, on='C', suffixes=suff_C)])
duplicates = (DF3.A_on_B_match_1 == DF3.A_on_B_match_2 == DF3.A_on_C_match_3,
DF3.B_on_A_match_1 == DF3.B_on_A_match_2 == DF3.B_on_C_match_3,
kbd>DF3.C_on_A_match_1 == DF3.C_on_B_match_2)
DF3.loc[~duplicates]
Expected Outcome
A | B | C | D | F | G |
---|---|---|---|---|---|
123 | 321 | abc | company1 | active | 650 |
456 | rss | company2 | active | 754 | |
321 | 432 | xyz | company3 | active | 820 |
klm | company4 | active | 840 |
DF3 = pd.concat([
DF1.merge(DF2, on='A'),
DF1.merge(DF2, on='B'),
DF1.merge(DF2, on='C')
])
print(DF3.drop_duplicates().reset_index(drop=True))
prints
A B_x C_x F G B_y C_y D A_x B A_y C
0 123 abc active 650 321 abc company1 NaN NaN NaN NaN
1 456 active 754 456 rss company2 NaN NaN NaN NaN
2 456 active 754 klm company4 NaN NaN NaN NaN
3 klm active 840 456 rss company2 NaN NaN NaN NaN
4 klm active 840 klm company4 NaN NaN NaN NaN
5 321 432 active 820 432 xyz company3 NaN NaN NaN NaN
6 NaN NaN abc active 650 NaN klm company4 123 NaN
7 NaN NaN klm active 840 NaN klm company4 NaN
8 NaN NaN active 754 NaN rss company2 456 NaN
9 NaN NaN active 820 NaN xyz company3 321 432 321 NaN
10 NaN NaN active 650 321 NaN company1 123 NaN 123 abc
11 NaN NaN active 840 NaN company4 NaN klm