I have to tables looks like following:
Table T1
ColumnA | ColumnB |
---|---|
A | 1 |
A | 3 |
B | 1 |
C | 2 |
Table T2
ColumnA | ColumnB |
---|---|
A | 1 |
A | 4 |
B | 1 |
D | 2 |
in SQL I will do following query to check the existence of each record
select
COALESCE(T1.ColumnA,T2.ColumnA) as ColumnA
,T1.ColumnB as ExistT1
,T2.ColumnB as ExistT2
from T1
full join T2 on
T1.ColumnA=T2.ColumnA
and T1.ColumnB=T2.ColumnB
where
(T1.ColumnA is null or T2.ColumnA is null)
I have tried many way in Pandas like concate, join, merge, etc, but it seems that the two merge keys would be combined into one. I think the problem is that I want to check is not 'data columns' but 'key columns'. Is there any good idea to do this in Python? Thanks!
ColumnA | ExistT1 | ExistT2 |
---|---|---|
A | 3 | null |
A | null | 4 |
C | 2 | null |
D | null | 2 |
pd.merge
has an indicator
parameter that could be helpful here:
(t1
.merge(t2, how = 'outer', indicator=True)
.loc[lambda df: df._merge!="both"]
.assign(ExistT1 = lambda df: df.ColumnB.where(df._merge.eq('left_only')),
ExistT2 = lambda df: df.ColumnB.where(df._merge.eq('right_only')) )
.drop(columns=['ColumnB', '_merge'])
)
ColumnA ExistT1 ExistT2
1 A 3.0 NaN
3 C 2.0 NaN
4 A NaN 4.0
5 D NaN 2.0