Say I have this:
df = polars.DataFrame(dict(
j=numpy.random.randint(10, 99, 10),
k=numpy.random.randint(10, 99, 10),
l=numpy.random.randint(10, 99, 10),
))
print(df)
shape: (10, 4)
j (i64) k (i64) l (i64)
32 82 34
67 40 53
11 81 86
10 13 36
70 80 62
91 31 90
18 59 51
98 67 92
23 13 25
57 78 74
shape: (10, 3)
and I want to apply the same when
/then
/otherwise
condition on multiple columns:
dfj = (df
.select(
polars
.when(polars.selectors.numeric() < 50)
.then(polars.lit(1))
.otherwise(polars.lit(2))
)
)
This fails with:
polars.exceptions.DuplicateError: the name: 'literal' is duplicate
How do I make this use the currently selected column as the alias? I.e. I want the equivalent of this:
dfj = (df
.select(
polars
.when(polars.col(c) < 50)
.then(polars.lit(1))
.otherwise(polars.lit(2))
.alias(c)
for c in df.columns
)
)
print(dfj)
j (i32) k (i32) l (i32)
1 2 1
2 1 2
1 2 2
1 1 1
2 2 2
2 1 2
1 2 2
2 2 2
1 1 1
2 2 2
shape: (10, 3)
When using literal values in then/otherwise you can use .name.keep()
to take the "original" column names.
(df
.select(
polars
.when(polars.selectors.numeric() < 50)
.then(polars.lit(1))
.otherwise(polars.lit(2))
.name.keep()
)
)
shape: (10, 3)
┌─────┬─────┬─────┐
│ j ┆ k ┆ l │
│ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ i32 │
╞═════╪═════╪═════╡
│ 1 ┆ 1 ┆ 2 │
│ 2 ┆ 1 ┆ 1 │
│ 1 ┆ 2 ┆ 1 │
│ 2 ┆ 2 ┆ 1 │
│ 2 ┆ 1 ┆ 2 │
│ 2 ┆ 2 ┆ 2 │
│ 2 ┆ 2 ┆ 1 │
│ 2 ┆ 2 ┆ 2 │
│ 1 ┆ 1 ┆ 2 │
│ 2 ┆ 2 ┆ 1 │
└─────┴─────┴─────┘