Search code examples
pythondataframepython-polars

In a polars dataframe, filter a column of type list by another column of type list


I have a dataframe as below:

Example input:

import polars as pl

df = pl.select(user_id=1, items=[1, 2, 3, 4], popular_items=[3, 4, 5, 6])
┌─────────────┬─────────────┬───────────────┐
│ user_id     ┆ items       ┆ popular_items │
│ ---         ┆ ---         ┆ ---           │
│ i64         ┆ list[i64]   ┆ list[i64]     │
╞═════════════╪═════════════╪═══════════════╡
│ 1           ┆[1, 2, 3, 4] ┆ [3, 4, 5, 6]  │
└─────────────┴─────────────┴───────────────┘

I want to filter popular_items column by removing any items that are in items column for each user_id

I have been trying to get it to work but have been unsuccessful due to various issues. In all likelihood, I am probably overcomplicating things.

The expected output should be as follows:

┌─────────────┬─────────────┬───────────────┬───────────┐
│ user_id     ┆ items       ┆ popular_items ┆ suggested │
│ ---         ┆ ---         ┆ ---           ┆ ---       │
│ i64         ┆ list[i64]   ┆ list[i64]     ┆ list[i64] │
╞═════════════╪═════════════╪═══════════════╪═══════════╡
│ 1           ┆ [1, 2, 3, 4]┆ [3, 4, 5, 6]  ┆ [5, 6]    │
└─────────────┴─────────────┴───────────────┴───────────┘

It seems like the solution should be simple, but it seems to escape me after some time now trying different things.

Any help would be greatly appreciated!


Solution

  • Update: .list.set_difference() has since been added to Polars.

    df.with_columns(
       suggested = pl.col("popular_items").list.set_difference("items")
    )
    
    shape: (1, 4)
    ┌─────────┬──────────────┬───────────────┬───────────┐
    │ user_id ┆ items        ┆ popular_items ┆ suggested │
    │ ---     ┆ ---          ┆ ---           ┆ ---       │
    │ i32     ┆ list[i64]    ┆ list[i64]     ┆ list[i64] │
    ╞═════════╪══════════════╪═══════════════╪═══════════╡
    │ 1       ┆ [1, 2, 3, 4] ┆ [3, 4, 5, 6]  ┆ [6, 5]    │
    └─────────┴──────────────┴───────────────┴───────────┘