Search code examples
pythonstructpython-polars

Polars struct.field(list[str]) returns a single column when dealing with list[Struct]


Some of my columns in my Polars Dataframe have the dtype pl.List(pl.Struct). I'm trying to replace these columns so that I get multiple columns that are lists of scalar values.

Here's an example of a column I'm trying to change:

import polars as pl

df = pl.DataFrame({
    "column_0": [
        [{"field_1": "a", "field_2": 1}, {"field_1": "b", "field_2":2}],
        [{"field_1": "c", "field_2":3}]
    ]
})

col_name = "column_0"
df.select(
    pl.col(col_name).list.eval(
        pl.element().struct.field("*")
    )
)

My expectation was that I'll get something like this:

shape: (2, 2)
┌────────────┬───────────┐
│ field_1    ┆ field_2   │
│ ---        ┆ ---       │
│ list[str]  ┆ list[i64] │
╞════════════╪═══════════╡
│ ["a", "b"] ┆ [1, 2]    │
│ ["c"]      ┆ [3]       │
└────────────┴───────────┘

Instead, I only get the last field (in this case, 'field_2'):

shape: (2, 1)
┌───────────┐
│ column_0  │
│ ---       │
│ list[i64] │
╞═══════════╡
│ [1, 2]    │
│ [3]       │
└───────────┘

Solution

  • You could unpack the lists/structs with .explode() + .unnest() and group the rows back together.

    (df.with_row_index()
       .explode("column_0")
       .unnest("column_0")
       .group_by("index", maintain_order=True)
       .all()
    )
    
    shape: (2, 3)
    ┌───────┬────────────┬───────────┐
    │ index ┆ field_1    ┆ field_2   │
    │ ---   ┆ ---        ┆ ---       │
    │ u32   ┆ list[str]  ┆ list[i64] │
    ╞═══════╪════════════╪═══════════╡
    │ 0     ┆ ["a", "b"] ┆ [1, 2]    │
    │ 1     ┆ ["c"]      ┆ [3]       │
    └───────┴────────────┴───────────┘