Search code examples
pythonmergeconcatenation

Merging DF with OR condition three common columns


Read this older post updated recently How to merge dataframes based on an "OR" condition. In that example there were 2 possible columns to merge on, in this case there are three A, B and C, with C being a string.

The two df could match on any combination of A, B OR C. And each produces unique merge records on their own. Using the above example as a guide without luck.

Appreciate any help!

DF2

A B C D
123 321 abc company1
456 rss company2
321 432 xyz company3
klm company4

DF1

A B C F G
123 abc active 650
456 active 754
321 432 active 820
klm active 840

Working Code

suff_A = ['_on_A_match_1', '_on_A_match_2', '_on_A_Match_3']
suff_B = ['_on_B_match_1', '_on_B_match_2', '_on_B_match_3']
suff_C = ['_on_C_match_1', '_on_C_match_2', '_on_C_match_3']

Get an error message on this piece too many values to unpack

DF3 = pd.concat([DF1.merge(DF2, on='A', suffixes=suff_A),
DF1.merge(DF2, on='B', suffixes=suff_B),
DF1.merge(DF2, on='C', suffixes=suff_C)])

duplicates = (DF3.A_on_B_match_1 == DF3.A_on_B_match_2 == DF3.A_on_C_match_3,
DF3.B_on_A_match_1 == DF3.B_on_A_match_2 == DF3.B_on_C_match_3,
kbd>DF3.C_on_A_match_1 == DF3.C_on_B_match_2)

DF3.loc[~duplicates]

Expected Outcome

A B C D F G
123 321 abc company1 active 650
456 rss company2 active 754
321 432 xyz company3 active 820
klm company4 active 840

Solution

  • DF3 = pd.concat([
        DF1.merge(DF2, on='A'),
        DF1.merge(DF2, on='B'),
        DF1.merge(DF2, on='C')
    ])
    print(DF3.drop_duplicates().reset_index(drop=True))
    

    prints

          A  B_x  C_x       F    G  B_y  C_y         D  A_x    B  A_y    C
    0   123       abc  active  650  321  abc  company1  NaN  NaN  NaN  NaN
    1        456       active  754  456  rss  company2  NaN  NaN  NaN  NaN
    2        456       active  754       klm  company4  NaN  NaN  NaN  NaN
    3             klm  active  840  456  rss  company2  NaN  NaN  NaN  NaN
    4             klm  active  840       klm  company4  NaN  NaN  NaN  NaN
    5   321  432       active  820  432  xyz  company3  NaN  NaN  NaN  NaN
    6   NaN  NaN  abc  active  650  NaN  klm  company4  123            NaN
    7   NaN  NaN  klm  active  840  NaN  klm  company4                 NaN
    8   NaN  NaN       active  754  NaN  rss  company2       456       NaN
    9   NaN  NaN       active  820  NaN  xyz  company3  321  432  321  NaN
    10  NaN       NaN  active  650  321  NaN  company1  123  NaN  123  abc
    11  NaN       NaN  active  840       NaN  company4       NaN       klm