Search code examples
python-polars

polars intersection of list columns in dataframe


import polars as pl

df = pl.DataFrame({'a': [[1, 2, 3], [8, 9, 4]], 'b': [[2, 3, 4], [4, 5, 6]]})

So given the dataframe df

    a           b
[1, 2, 3]   [2, 3, 4]
[8, 9, 4]   [4, 5, 6]

I would like to get a column c, that is an intersection of a and b

    a           b          c
[1, 2, 3]   [2, 3, 4]    [2, 3]
[8, 9, 4]   [4, 5, 6]     [4]

I know I can use the apply function with python set intersection, but I want to do it using polars expressions.


Solution

  • Polars has dedicated set_* methods for lists.

    pl.Config(fmt_table_cell_list_len=10, fmt_str_lengths=80) # increase repr len
    
    df.with_columns(
       intersection = pl.col("a").list.set_intersection("b"),
       difference = pl.col("a").list.set_difference("b"),
       symmetric_difference = pl.col("a").list.set_symmetric_difference("b"),
       union = pl.col("a").list.set_union("b")
    )
    
    shape: (2, 6)
    ┌───────────┬───────────┬──────────────┬────────────┬──────────────────────┬─────────────────┐
    │ a         ┆ b         ┆ intersection ┆ difference ┆ symmetric_difference ┆ union           │
    │ ---       ┆ ---       ┆ ---          ┆ ---        ┆ ---                  ┆ ---             │
    │ list[i64] ┆ list[i64] ┆ list[i64]    ┆ list[i64]  ┆ list[i64]            ┆ list[i64]       │
    ╞═══════════╪═══════════╪══════════════╪════════════╪══════════════════════╪═════════════════╡
    │ [1, 2, 3] ┆ [2, 3, 4] ┆ [2, 3]       ┆ [1]        ┆ [1, 4]               ┆ [1, 2, 3, 4]    │
    │ [8, 9, 4] ┆ [4, 5, 6] ┆ [4]          ┆ [8, 9]     ┆ [8, 9, 5, 6]         ┆ [8, 9, 4, 5, 6] │
    └───────────┴───────────┴──────────────┴────────────┴──────────────────────┴─────────────────┘