Is there a polars native way to explode out a column of lists horizontally?
e.g. go from this:
df = pl.DataFrame(
{
"letters": ["a", "a", "b", "c"],
"numbers": [[1, 1], [2, 3], [4, 5], [6, 7]],
}
)
To this?
shape: (4, 3)
┌─────────┬───────────┬───────────┐
│ letters ┆ numbers_a ┆ numbers_b │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════════╪═══════════╪═══════════╡
│ a ┆ 1 ┆ 1 │
│ a ┆ 2 ┆ 3 │
│ b ┆ 4 ┆ 5 │
│ c ┆ 6 ┆ 7 │
└─────────┴───────────┴───────────┘
I am aware of the .explode()
method but afaik this is only possible to use vertically
There are two ways you can do this.
The easiest is to .list.to_struct
and then unnest
from string import ascii_lowercase
(
df
.with_columns(
pl.col('numbers').list.to_struct(
fields=lambda idx: f"numbers_{ascii_lowercase[idx]}",
n_field_strategy='max_width'
)
)
.unnest('numbers')
)
from string import ascii_lowercase
max_width=df['numbers'].list.len().max()
(
df
.select(
pl.exclude('numbers'),
*[pl.col('numbers').list.get(x).alias(f"numbers_{ascii_lowercase[x]}")
for x in range(max_width)]
)
)
In either case you're going to get:
shape: (4, 3)
┌─────────┬───────────┬───────────┐
│ letters ┆ numbers_a ┆ numbers_b │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════════╪═══════════╪═══════════╡
│ a ┆ 1 ┆ 1 │
│ a ┆ 2 ┆ 3 │
│ b ┆ 4 ┆ 5 │
│ c ┆ 6 ┆ 7 │
└─────────┴───────────┴───────────┘
In the former case you can more easily chain the methods but the second case relies on first determining the max_width
which is less flexible. The second method might be more performant but I don't really know for sure so it's worth checking.