I have two Pandas dataframes df1
and df2
, both have a column for keys, respectively labeled id1
and id2
. How could I simply show which are the keys that are common to both dataframes, those who are only in df1
those which are only in df2
? The idea would be to show a Venn diagram with
df1
df2
df1
and df2
A visual representation would be nice but not necessary.
One way could be an outer merge with an indicator column:
Let’s try with this simple example:
>>> df1
id1
0 a
1 b
2 c
>>> df2
id2
0 b
1 c
2 d
3 e
>>> mrg = pd.merge(df1['id1'].rename('id'), df2['id2'].rename('id'), on='id',
... how='outer', indicator=True)
>>> mrg
id _merge
0 a left_only
1 b both
2 c both
3 d right_only
4 e right_only
Now you get a column id
with the ids, and a column _merge
that tells you if they’re only in the left, the right, or in both dataframes.
Getting the number of elements is pretty easy from there:
>>> mrg['_merge'].value_counts()
right_only 2
both 2
left_only 1
Name: _merge, dtype: int64