Some of my columns in my Polars Dataframe have the dtype pl.List(pl.Struct). I'm trying to replace these columns so that I get multiple columns that are lists of scalar values.
Here's an example of a column I'm trying to change:
import polars as pl
df = pl.DataFrame({
"column_0": [
[{"field_1": "a", "field_2": 1}, {"field_1": "b", "field_2":2}],
[{"field_1": "c", "field_2":3}]
]
})
col_name = "column_0"
df.select(
pl.col(col_name).list.eval(
pl.element().struct.unnest()
)
)
My expectation was that I'll get something like this:
shape: (2, 2)
┌────────────┬───────────┐
│ field_1 ┆ field_2 │
│ --- ┆ --- │
│ list[str] ┆ list[i64] │
╞════════════╪═══════════╡
│ ["a", "b"] ┆ [1, 2] │
│ ["c"] ┆ [3] │
└────────────┴───────────┘
Instead, I only get the last field (in this case, 'field_2'):
shape: (2, 1)
┌───────────┐
│ column_0 │
│ --- │
│ list[i64] │
╞═══════════╡
│ [1, 2] │
│ [3] │
└───────────┘
You could unpack the lists/structs with .explode()
+ .unnest()
and group the rows back together.
(df.with_row_index()
.explode("column_0")
.unnest("column_0")
.group_by("index", maintain_order=True)
.all()
)
shape: (2, 3)
┌───────┬────────────┬───────────┐
│ index ┆ field_1 ┆ field_2 │
│ --- ┆ --- ┆ --- │
│ u32 ┆ list[str] ┆ list[i64] │
╞═══════╪════════════╪═══════════╡
│ 0 ┆ ["a", "b"] ┆ [1, 2] │
│ 1 ┆ ["c"] ┆ [3] │
└───────┴────────────┴───────────┘