I want to add a column result
to a polars DataFrame
that contains a list of the column names with a value greater than zero at that position.
So given this:
import polars as pl
df = pl.DataFrame({"apple": [1, 0, 2, 0], "banana": [1, 0, 0, 1]})
cols = ["apple", "banana"]
How do I get:
shape: (4, 3)
┌───────┬────────┬─────────────────────┐
│ apple ┆ banana ┆ result │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ list[str] │
╞═══════╪════════╪═════════════════════╡
│ 1 ┆ 1 ┆ ["apple", "banana"] │
│ 0 ┆ 0 ┆ [] │
│ 2 ┆ 0 ┆ ["apple"] │
│ 0 ┆ 1 ┆ ["banana"] │
└───────┴────────┴─────────────────────┘
All I have so far is the truth values:
df.with_columns(pl.concat_list(pl.col(cols).gt(0)).alias("result"))
shape: (4, 3)
┌───────┬────────┬────────────────┐
│ apple ┆ banana ┆ result │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ list[bool] │
╞═══════╪════════╪════════════════╡
│ 1 ┆ 1 ┆ [true, true] │
│ 0 ┆ 0 ┆ [false, false] │
│ 2 ┆ 0 ┆ [true, false] │
│ 0 ┆ 1 ┆ [false, true] │
└───────┴────────┴────────────────┘
Here's one way: you can use pl.when
with pl.lit
in the concat_list
to get either the literal column names or null
s, then do a list.drop_nulls
:
df.with_columns(
result=pl.concat_list(
pl.when(pl.col(col) > 0).then(pl.lit(col)) for col in df.columns
).list.drop_nulls()
)
shape: (4, 3)
┌───────┬────────┬─────────────────────┐
│ apple ┆ banana ┆ result │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ list[str] │
╞═══════╪════════╪═════════════════════╡
│ 1 ┆ 1 ┆ ["apple", "banana"] │
│ 0 ┆ 0 ┆ [] │
│ 2 ┆ 0 ┆ ["apple"] │
│ 0 ┆ 1 ┆ ["banana"] │
└───────┴────────┴─────────────────────┘