I'm working with a deeply nested DataFrame (not good practice, I know), and I'd like to express something like "select field X for all structs in list Y".
An example of the data structure:
import polars as pl
data = {
"a": [
[{
"x": [1, 2, 3],
"y": [4, 5, 6]
},
{
"x": [2, 3, 4],
"y": [3, 4, 5]
}
]
],
}
df = pl.DataFrame(data)
In this case, I'd like to select field "x" in both of the structs, and gather them into a df with two series, call them"x_1" and "x_2".
In other words, the desired output is:
┌───────────┬───────────┐
│ x_1 ┆ x_2 │
│ --- ┆ --- │
│ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╡
│ [1, 2, 3] ┆ [2, 3, 4] │
└───────────┴───────────┘
I don't know the length of the list ahead of time, and I'd like to do this dynamically (i.e. without hard-coding the field names). I'm not sure whether this is possible using Polars expressions?
Thanks in advance!
Update: Perhaps a simpler approach using .unstack()
(df.select(pl.col("a").flatten().struct.field("x"))
.unstack(1)
)
shape: (1, 2)
┌───────────┬───────────┐
│ x_0 ┆ x_1 │
│ --- ┆ --- │
│ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╡
│ [1, 2, 3] ┆ [2, 3, 4] │
└───────────┴───────────┘
Original answer:
df.select(
pl.col("a").list.eval(pl.element().struct["x"])
.list.to_struct("max_width", lambda idx: f"x_{idx + 1}")
).unnest("a")
shape: (1, 2)
┌───────────┬───────────┐
│ x_1 ┆ x_2 │
│ --- ┆ --- │
│ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╡
│ [1, 2, 3] ┆ [2, 3, 4] │
└───────────┴───────────┘
.list.eval()
to loop through each list element, we extract each struct field.df.select(
pl.col("a").list.eval(pl.element().struct["x"])
)
# shape: (1, 1)
# ┌────────────────────────┐
# │ a │
# │ --- │
# │ list[list[i64]] │
# ╞════════════════════════╡
# │ [[1, 2, 3], [2, 3, 4]] │
# └────────────────────────┘
.list.to_struct()
to convert to a struct which will allow us to turn each inner list into its own column.df.select(
pl.col("a").list.eval(pl.element().struct["x"])
.list.to_struct("max_width", lambda idx: f"x_{idx + 1}")
)
# shape: (1, 1)
# ┌───────────────────────┐
# │ a │
# │ --- │
# │ struct[2] │
# ╞═══════════════════════╡
# │ {[1, 2, 3],[2, 3, 4]} │
# └───────────────────────┘
.unnest()
the struct to create individual columns.