Say I have:
In [1]: df = pl.DataFrame({'a': [[1,2], [3,4]]})
In [2]: df
Out[2]:
shape: (2, 1)
┌───────────┐
│ a │
│ --- │
│ list[i64] │
╞═══════════╡
│ [1, 2] │
│ [3, 4] │
└───────────┘
I know that all elements of 'a'
are lists of the same length.
I can do:
In [10]: df.select(pl.col('a').list.get(i).alias(f'a_{i}') for i in range(2))
Out[10]:
shape: (2, 2)
┌─────┬─────┐
│ a_0 ┆ a_1 │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 2 │
│ 3 ┆ 4 │
└─────┴─────┘
but this involved hard-coding 2
.
Is there a way to do this without hard-coding the 2
? I may not know in advance how many elements there in the lists (I just know that they all have the same number of elements)
You can convert the list to a struct and .unnest()
df.with_columns(pl.col("a").list.to_struct()).unnest("a")
shape: (2, 2)
┌─────────┬─────────┐
│ field_0 ┆ field_1 │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════════╪═════════╡
│ 1 ┆ 2 │
│ 3 ┆ 4 │
└─────────┴─────────┘
Warning: If your lists are not the same length, you must set n_field_strategy
to max_width
.
.list.to_struct("max_width")
By default, it uses the length of the first list found.
This would result in truncated data if you had longer lists later in your data.