I'd like to combine multiple columns as a list into a single column.
For example, this data frame:
import polars as pl
import numpy as np
df = pl.from_repr("""
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 4 │
│ 2 ┆ 5 │
│ 3 ┆ 6 │
└─────┴─────┘
""")
into this one:
┌────────────┐
│ combine │
│ --- │
│ list [i64] │
╞════════════╡
│ [1, 4] │
│ [2, 5] │
│ [3, 6] │
└────────────┘
Right now I'm doing it this way:
df = df.with_columns(pl.map_batches(['a','b'],lambda df:pl.Series(np.column_stack([df[0].to_numpy(),df[1].to_numpy()]).tolist())).alias('combine'))
Is there a better way to do it?
Update: reshape
now returns a (fixed-width) Array
type in Polars.
For lists, pl.concat_list("a", "b")
can be used directly.
Original answer
With the landing of this PR, we can reshape
a Series/Expr
into a Series/Expr
of type List
. These can then be concatenated
per row.
df = pl.DataFrame({
"a": [1, 2, 3],
"b": [4, 5, 6]
})
df.select(
pl.concat_list(
pl.col("a").reshape((-1, 1)),
pl.col("b").reshape((-1, 1))
)
)
Outputs:
shape: (3, 1)
┌────────────┐
│ a │
│ --- │
│ list [i64] │
╞════════════╡
│ [1, 4] │
│ [2, 5] │
│ [3, 6] │
└────────────┘
Note that we give the shape (-1, 1)
, where -1
means infer the dimension size. So this reads as (infer the rows, 1 column)
.
You can compile polars from source to use this new feature, or wait a few days and then its landed on PyPi.