Search code examples
pythonpandasdataframesetset-intersection

Pair-wise intersection set operation between column values in a data frame


I have a data frame with one column. Each value in this column is a list. For example,

     A
0   [1, 3, 4]
1   [43, 1, 42]
2   [50, 3]

I want to perform the set intersection operation between each list to find common elements and produce a data frame as below.

    0           1           2 
0   [1, 2, 3]   [1]         [3]
1   [1]         [43, 1, 42] []
2   [3]         []          [50, 3]

Is there an elegant way of doing this rather than looping over?


Solution

  • We can apply set to convert all values in A to set then broadcast set intersection:

    import pandas as pd
    
    df = pd.DataFrame({'A': [[1, 3, 4], [43, 1, 42], [50, 3]]})
    
    # Convert to set
    a = df['A'].apply(set).values
    # Broadcast set intersection
    new_df = pd.DataFrame(a[:, None] & a)
    

    new_df:

               0            1        2
    0  {1, 3, 4}          {1}      {3}
    1        {1}  {1, 42, 43}       {}
    2        {3}           {}  {50, 3}
    

    Or np.vectorize can be used to convert to list if needed (it can also be used to convert to set instead of apply):

    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame({'A': [[1, 3, 4], [43, 1, 42], [50, 3]]})
    
    # Convert to set (using vectorize instead of apply):
    a = np.vectorize(set, otypes=['O'])(df['A'])
    # Broadcast set intersection and convert back to list
    new_df = pd.DataFrame(
        np.vectorize(list, otypes=['O'])(a[:, None] & a)
    )
    

    new_df:

               0            1        2
    0  [1, 3, 4]          [1]      [3]
    1        [1]  [1, 42, 43]       []
    2        [3]           []  [50, 3]