transforming multiple columns using loop

I am trying to transform groups of columns based on certain conditions. I am having trouble looping over groups of columns in my select statement. Simplified example (actual df has many more columns in avars and bvars):

df = pl.DataFrame(
    {'a': [0,1,2,4],
     'b': [1,0,3,5],
     'w': [4,7,5,8],
     'x': [10, 20, 25, 30],
     'y': [15,3,16,88],
     'z': [22,17,4,32]
     }
)
avars=['w','x']
bvars=['y','z']

This works:

df.select(
    (pl.when(pl.col('a')>0)
    .then(pl.col(var)/pl.col('a')) 
    .otherwise(pl.col(var)) for var in avars),
)

But I get an error message when I try

df.select(
    (pl.when(pl.col('a')>0)
    .then(pl.col(var)/pl.col('a')) 
    .otherwise(pl.col(var)) for var in avars),
    (pl.when(pl.col('b')>0)
    .then(pl.col(var)/pl.col('b')) 
    .otherwise(pl.col(var)) for var in bvars),
)

Even this fails:

df.select(
    pl.col('a'),
    pl.col('b'),
    (pl.when(pl.col('a')>0)
    .then(pl.col(var)/pl.col('a')) 
    .otherwise(pl.col(var)) for var in avars),
)

It seems that selecting any columns in addition to the group of columns in the for loop causes an error message. Can someone point me in the right direction?

Solution

You need to unpack the comprehension with a * as select won't accept a generator when other values are present.

import polars as pl

df = pl.DataFrame({"a": [0, 3], "b": [3, 4], "c": [5, 6], "d": [7, 9]})

avars = ["c", "d"]

df = df.select(
    pl.col("a"),
    pl.col("b"),
    *(
        pl.when(pl.col("a") > 0).then(pl.col(var) / pl.col("a")).otherwise(pl.col(var))
        for var in avars
    ),
)

print(df)

Results:

shape: (2, 4)
┌─────┬─────┬─────┬─────┐
│ a   ┆ b   ┆ c   ┆ d   │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╪═════╡
│ 0   ┆ 3   ┆ 5.0 ┆ 7.0 │
│ 3   ┆ 4   ┆ 2.0 ┆ 3.0 │
└─────┴─────┴─────┴─────┘