I am trying to transform groups of columns based on certain conditions. I am having trouble looping over groups of columns in my select statement. Simplified example (actual df has many more columns in avars and bvars):
df = pl.DataFrame(
{'a': [0,1,2,4],
'b': [1,0,3,5],
'w': [4,7,5,8],
'x': [10, 20, 25, 30],
'y': [15,3,16,88],
'z': [22,17,4,32]
}
)
avars=['w','x']
bvars=['y','z']
This works:
df.select(
(pl.when(pl.col('a')>0)
.then(pl.col(var)/pl.col('a'))
.otherwise(pl.col(var)) for var in avars),
)
But I get an error message when I try
df.select(
(pl.when(pl.col('a')>0)
.then(pl.col(var)/pl.col('a'))
.otherwise(pl.col(var)) for var in avars),
(pl.when(pl.col('b')>0)
.then(pl.col(var)/pl.col('b'))
.otherwise(pl.col(var)) for var in bvars),
)
Even this fails:
df.select(
pl.col('a'),
pl.col('b'),
(pl.when(pl.col('a')>0)
.then(pl.col(var)/pl.col('a'))
.otherwise(pl.col(var)) for var in avars),
)
It seems that selecting any columns in addition to the group of columns in the for loop causes an error message. Can someone point me in the right direction?
You need to unpack the comprehension with a *
as select
won't accept a generator when other values are present.
import polars as pl
df = pl.DataFrame({"a": [0, 3], "b": [3, 4], "c": [5, 6], "d": [7, 9]})
avars = ["c", "d"]
df = df.select(
pl.col("a"),
pl.col("b"),
*(
pl.when(pl.col("a") > 0).then(pl.col(var) / pl.col("a")).otherwise(pl.col(var))
for var in avars
),
)
print(df)
Results:
shape: (2, 4)
┌─────┬─────┬─────┬─────┐
│ a ┆ b ┆ c ┆ d │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╪═════╡
│ 0 ┆ 3 ┆ 5.0 ┆ 7.0 │
│ 3 ┆ 4 ┆ 2.0 ┆ 3.0 │
└─────┴─────┴─────┴─────┘