Search code examples
pythonpandasdataframemergeconcatenation

concat two dataframes and exclude overlapping rows


I'm trying to concat two dataframes df1 and df2:

Input

        name   age   hobby   married
index
0       jack   20    hockey  yes
1       ben    19    chess   no
2       lisa   30    golf    no
        name   age    hobby      job
index
0       jack   20     hockey     student
1       anna   34     football   finance
2       dan    26     golf       retail

which I want to match on multiple columns, so let's say ['name', 'age'], to get df:

Output

        name   age   hobby     married   job
index
0       jack   20    hockey    yes       student
1       ben    19    chess     no        /
2       lisa   30    golf      no        /
3       anna   34    football  /         finance
4       dan    26    golf      /         retail

Is it possible to do this by using concat? Because I can't find out how to match the list of keys to avoid the overlapping rows...


Solution

  • Here's another way:

    df1.set_index(['name', 'age'])\
       .combine_first(df2.set_index(['name', 'age']))\
       .reset_index()\
       .fillna('/')
    

    Output:

       name  age     hobby      job married
    0  anna   34  football  finance       /
    1   ben   19     chess        /      no
    2   dan   26      golf   retail       /
    3  jack   20    hockey  student     yes
    4  lisa   30      golf        /      no
    

    Let's use intrinsic data alignment in pandas by setting the index to columns you want to "join" on then use combine_first the dataframes.