Search code examples
pythonpython-polars

polars: list to columns, without `get`


Say I have:

In [1]: df = pl.DataFrame({'a': [[1,2], [3,4]]})

In [2]: df
Out[2]:
shape: (2, 1)
┌───────────┐
│ a         │
│ ---       │
│ list[i64] │
╞═══════════╡
│ [1, 2]    │
│ [3, 4]    │
└───────────┘

I know that all elements of 'a' are lists of the same length.

I can do:

In [10]: df.select(pl.col('a').list.get(i).alias(f'a_{i}') for i in range(2))
Out[10]:
shape: (2, 2)
┌─────┬─────┐
│ a_0 ┆ a_1 │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 2   │
│ 3   ┆ 4   │
└─────┴─────┘

but this involved hard-coding 2.

Is there a way to do this without hard-coding the 2? I may not know in advance how many elements there in the lists (I just know that they all have the same number of elements)


Solution

  • You can convert the list to a struct and .unnest()

    df.with_columns(pl.col("a").list.to_struct()).unnest("a")
    
    shape: (2, 2)
    ┌─────────┬─────────┐
    │ field_0 ┆ field_1 │
    │ ---     ┆ ---     │
    │ i64     ┆ i64     │
    ╞═════════╪═════════╡
    │ 1       ┆ 2       │
    │ 3       ┆ 4       │
    └─────────┴─────────┘
    

    Warning: If your lists are not the same length, you must set n_field_strategy to max_width.

    .list.to_struct("max_width")
    

    By default, it uses the length of the first list found.

    This would result in truncated data if you had longer lists later in your data.