Say I have this:
import polars as pl
import polars.selectors as cs
df = pl.select(
j = pl.int_range(10, 99).sample(10, with_replacement=True),
k = pl.int_range(10, 99).sample(10, with_replacement=True),
l = pl.int_range(10, 99).sample(10, with_replacement=True),
)
shape: (10, 3)
┌─────┬─────┬─────┐
│ j ┆ k ┆ l │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 71 ┆ 79 ┆ 67 │
│ 26 ┆ 42 ┆ 55 │
│ 12 ┆ 43 ┆ 85 │
│ 92 ┆ 96 ┆ 14 │
│ 95 ┆ 26 ┆ 62 │
│ 75 ┆ 14 ┆ 56 │
│ 61 ┆ 41 ┆ 75 │
│ 74 ┆ 97 ┆ 70 │
│ 73 ┆ 32 ┆ 10 │
│ 66 ┆ 98 ┆ 40 │
└─────┴─────┴─────┘
and I want to apply the same when
/then
/otherwise
condition on multiple columns:
df.select(
pl.when(cs.numeric() < 50)
.then(1)
.otherwise(2)
)
This fails with:
DuplicateError: the name 'literal' is duplicate
How do I make this use the currently selected column as the alias? I.e. I want the equivalent of this:
df.select(
pl.when(pl.col(c) < 50)
.then(1)
.otherwise(2)
.alias(c)
for c in df.columns
)
shape: (10, 3)
┌─────┬─────┬─────┐
│ j ┆ k ┆ l │
│ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ i32 │
╞═════╪═════╪═════╡
│ 2 ┆ 2 ┆ 2 │
│ 1 ┆ 1 ┆ 2 │
│ 1 ┆ 1 ┆ 2 │
│ 2 ┆ 2 ┆ 1 │
│ 2 ┆ 1 ┆ 2 │
│ 2 ┆ 1 ┆ 2 │
│ 2 ┆ 1 ┆ 2 │
│ 2 ┆ 2 ┆ 2 │
│ 2 ┆ 1 ┆ 1 │
│ 2 ┆ 2 ┆ 1 │
└─────┴─────┴─────┘
You can use .name.keep()
df.select(
pl.when(cs.numeric() < 50)
.then(1)
.otherwise(2)
.name.keep()
)
shape: (10, 3)
┌─────┬─────┬─────┐
│ j ┆ k ┆ l │
│ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ i32 │
╞═════╪═════╪═════╡
│ 1 ┆ 1 ┆ 2 │
│ 2 ┆ 1 ┆ 1 │
│ 1 ┆ 2 ┆ 1 │
│ 2 ┆ 2 ┆ 1 │
│ 2 ┆ 1 ┆ 2 │
│ 2 ┆ 2 ┆ 2 │
│ 2 ┆ 2 ┆ 1 │
│ 2 ┆ 2 ┆ 2 │
│ 1 ┆ 1 ┆ 2 │
│ 2 ┆ 2 ┆ 1 │
└─────┴─────┴─────┘