Search code examples
pythonpython-polars

Concatenate multiple columns into a list in a single column


I'd like to combine multiple columns as a list into a single column.

For example, this data frame:

┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 5   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 3   ┆ 6   │
└─────┴─────┘

into this one:

┌────────────┐
│ combine    │
│ ---        │
│ list [i64] │
╞════════════╡
│ [1, 4]     │
├╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [2, 5]     │
├╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [3, 6]     │
└────────────┘

Right now I'm doing it this way:

df = df.with_column(pl.map(['a','b'],lambda df:pl.Series(np.column_stack([df[0].to_numpy(),df[1].to_numpy()]).tolist())).alias('combine'))

Is there a better way to do it?


Solution

  • With the landing of this PR, we can reshape a Series/Expr into a Series/Expr of type List. These can then be concatenated per row.

    df = pl.DataFrame({
        "a": [1, 2, 3],
        "b": [4, 5, 6]
    })
    
    
    df.select([
        pl.concat_list([
            pl.col("a").reshape((-1, 1)), 
            pl.col("b").reshape((-1, 1))
        ])
    ])
    

    Outputs:

    shape: (3, 1)
    ┌────────────┐
    │ a          │
    │ ---        │
    │ list [i64] │
    ╞════════════╡
    │ [1, 4]     │
    ├╌╌╌╌╌╌╌╌╌╌╌╌┤
    │ [2, 5]     │
    ├╌╌╌╌╌╌╌╌╌╌╌╌┤
    │ [3, 6]     │
    └────────────┘
    
    

    Note that we give the shape (-1, 1), where -1 means infer the dimension size. So this reads as (infer the rows, 1 column).

    You can compile polars from source to use this new feature, or wait a few days and then its landed on PyPi.