Search code examples
pythonpandasmergeouter-join

Outer join to check existence each records of two pandas dataframes like SQL


I have to tables looks like following:

Table T1

ColumnA ColumnB
A 1
A 3
B 1
C 2

Table T2

ColumnA ColumnB
A 1
A 4
B 1
D 2

in SQL I will do following query to check the existence of each record

select 
COALESCE(T1.ColumnA,T2.ColumnA) as ColumnA
,T1.ColumnB as ExistT1
,T2.ColumnB as ExistT2
from T1
full join T2 on 
   T1.ColumnA=T2.ColumnA
   and T1.ColumnB=T2.ColumnB
where 
 (T1.ColumnA is null or T2.ColumnA is null)

I have tried many way in Pandas like concate, join, merge, etc, but it seems that the two merge keys would be combined into one. I think the problem is that I want to check is not 'data columns' but 'key columns'. Is there any good idea to do this in Python? Thanks!

ColumnA ExistT1 ExistT2
A 3 null
A null 4
C 2 null
D null 2

Solution

  • pd.merge has an indicator parameter that could be helpful here:

    (t1
    .merge(t2, how = 'outer', indicator=True)
    .loc[lambda df: df._merge!="both"]
    .assign(ExistT1 = lambda df: df.ColumnB.where(df._merge.eq('left_only')), 
            ExistT2 = lambda df: df.ColumnB.where(df._merge.eq('right_only')) )
    .drop(columns=['ColumnB', '_merge'])
    )
    
      ColumnA  ExistT1  ExistT2
    1       A      3.0      NaN
    3       C      2.0      NaN
    4       A      NaN      4.0
    5       D      NaN      2.0