Search code examples
uniquewindow-functionspython-polars

Make a new column with list of unique values grouped by, or over, another column in Polars


Before Polars 0.18.0, I was able to create a column with a list of all unique pokemon by type. I see Expr.list() has been refactored to implode(), but I'm having trouble replicating the following using the new syntax:

df.with_columns(lst_of_pokemon = pl.col('name').unique().list().over('Type 1'))


Solution

  • The mapping_strategy= argument for .over was added.

    df = pl.from_repr("""
    ┌──────┬──────┐
    │ name ┆ type │
    │ ---  ┆ ---  │
    │ str  ┆ i64  │
    ╞══════╪══════╡
    │ a    ┆ 1    │
    │ a    ┆ 1    │
    │ a    ┆ 2    │
    │ b    ┆ 2    │
    │ c    ┆ 3    │
    │ c    ┆ 3    │
    └──────┴──────┘
    """)
    
    df.with_columns(lst_of_pokemon = 
       pl.col('name').unique().over('type', mapping_strategy='join')
    )
    
    shape: (6, 3)
    ┌──────┬──────┬────────────────┐
    │ name ┆ type ┆ lst_of_pokemon │
    │ ---  ┆ ---  ┆ ---            │
    │ str  ┆ i64  ┆ list[str]      │
    ╞══════╪══════╪════════════════╡
    │ a    ┆ 1    ┆ ["a"]          │
    │ a    ┆ 1    ┆ ["a"]          │
    │ a    ┆ 2    ┆ ["a", "b"]     │
    │ b    ┆ 2    ┆ ["a", "b"]     │
    │ c    ┆ 3    ┆ ["c"]          │
    │ c    ┆ 3    ┆ ["c"]          │
    └──────┴──────┴────────────────┘