import polars as pl
df = pl.DataFrame({'a': [[1, 2, 3], [8, 9, 4]], 'b': [[2, 3, 4], [4, 5, 6]]})
So given the dataframe df
a b
[1, 2, 3] [2, 3, 4]
[8, 9, 4] [4, 5, 6]
I would like to get a column c, that is an intersection of a and b
a b c
[1, 2, 3] [2, 3, 4] [2, 3]
[8, 9, 4] [4, 5, 6] [4]
I know I can use the apply function with python set intersection, but I want to do it using polars expressions.
Polars has dedicated set_*
methods for lists.
pl.Config(fmt_table_cell_list_len=10, fmt_str_lengths=80) # increase repr len
df.with_columns(
intersection = pl.col("a").list.set_intersection("b"),
difference = pl.col("a").list.set_difference("b"),
symmetric_difference = pl.col("a").list.set_symmetric_difference("b"),
union = pl.col("a").list.set_union("b")
)
shape: (2, 6)
┌───────────┬───────────┬──────────────┬────────────┬──────────────────────┬─────────────────┐
│ a ┆ b ┆ intersection ┆ difference ┆ symmetric_difference ┆ union │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ list[i64] ┆ list[i64] ┆ list[i64] ┆ list[i64] ┆ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╪══════════════╪════════════╪══════════════════════╪═════════════════╡
│ [1, 2, 3] ┆ [2, 3, 4] ┆ [2, 3] ┆ [1] ┆ [1, 4] ┆ [1, 2, 3, 4] │
│ [8, 9, 4] ┆ [4, 5, 6] ┆ [4] ┆ [8, 9] ┆ [8, 9, 5, 6] ┆ [8, 9, 4, 5, 6] │
└───────────┴───────────┴──────────────┴────────────┴──────────────────────┴─────────────────┘