Search code examples
pythonpython-polars

How can I consolidate all rows with the same ID in Polars?


I have a Polars dataframe with a lot of duplicate data I would like to consolidate.

Input:

shape: (3, 2)
┌─────┬──────┐
│ id  ┆ data │
│ --- ┆ ---  │
│ i64 ┆ str  │
╞═════╪══════╡
│ 1   ┆ a    │
│ 1   ┆ b    │
│ 1   ┆ c    │
└─────┴──────┘

My current (non-working) solution:

df = pl.DataFrame({'id': [1, 1, 1], 'data': ['a', 'b', 'c']})
df = df.join(df.select('id', 'data'), on='id')

Output:

shape: (9, 3)
┌─────┬──────┬────────────┐
│ id  ┆ data ┆ data_right │
│ --- ┆ ---  ┆ ---        │
│ i64 ┆ str  ┆ str        │
╞═════╪══════╪════════════╡
│ 1   ┆ a    ┆ a          │
│ 1   ┆ b    ┆ a          │
│ 1   ┆ c    ┆ a          │
│ 1   ┆ a    ┆ b          │
│ 1   ┆ b    ┆ b          │
│ 1   ┆ c    ┆ b          │
│ 1   ┆ a    ┆ c          │
│ 1   ┆ b    ┆ c          │
│ 1   ┆ c    ┆ c          │
└─────┴──────┴────────────┘

Desired output:

shape: (1, 4)
┌─────┬────────┬────────┬────────┐
│ id  ┆ data_1 ┆ data_2 ┆ data_3 │
│ --- ┆ ---    ┆ ---    ┆ ---    │
│ str ┆ str    ┆ str    ┆ str    │
╞═════╪════════╪════════╪════════╡
│ 1   ┆ a      ┆ b      ┆ c      │
└─────┴────────┴────────┴────────┘

It seems like a self join would be the way to get a table with all the columns I want, but I'm unsure how to write a self join that would include multiple columns instead of just a bunch of rows with only two, and looping through self joins does not seem like the correct thing to do as it quickly balloons in size. This isn't specifically a Polars problem, but I am working in Python-Polars


Solution

  • The following gives you a df in the format you want:

    import polars as pl
    
    df = (
        pl.DataFrame({"id": [1, 1, 1], "data": ["a", "b", "c"]})
        .group_by("id")
        .agg(pl.col("data"))
        .with_columns(structure=pl.col("data").list.to_struct())
        .unnest("structure")
        .drop("data")
    )
    
    print(df)
    """
    ┌─────┬─────────┬─────────┬─────────┐
    │ id  ┆ field_0 ┆ field_1 ┆ field_2 │
    │ --- ┆ ---     ┆ ---     ┆ ---     │
    │ i64 ┆ str     ┆ str     ┆ str     │
    ╞═════╪═════════╪═════════╪═════════╡
    │ 1   ┆ a       ┆ b       ┆ c       │
    └─────┴─────────┴─────────┴─────────┘
    """