Search code examples
python-3.xpandasdata-wranglingvenn-diagramexploratory-data-analysis

How to show the logical relation between two Pandas dataframes?


I have two Pandas dataframes df1 and df2, both have a column for keys, respectively labeled id1 and id2. How could I simply show which are the keys that are common to both dataframes, those who are only in df1 those which are only in df2? The idea would be to show a Venn diagram with

  • the number of elements only in df1
  • the number of elements only in df2
  • the number of elements in df1 and df2

A visual representation would be nice but not necessary.


Solution

  • One way could be an outer merge with an indicator column:

    Let’s try with this simple example:

    >>> df1
      id1
    0   a
    1   b
    2   c
    >>> df2
      id2
    0   b
    1   c
    2   d
    3   e
    >>> mrg = pd.merge(df1['id1'].rename('id'), df2['id2'].rename('id'), on='id',
    ...                how='outer', indicator=True)
    >>> mrg
      id      _merge
    0  a   left_only
    1  b        both
    2  c        both
    3  d  right_only
    4  e  right_only
    

    Now you get a column id with the ids, and a column _merge that tells you if they’re only in the left, the right, or in both dataframes.

    Getting the number of elements is pretty easy from there:

    >>> mrg['_merge'].value_counts()
    right_only    2
    both          2
    left_only     1
    Name: _merge, dtype: int64