Search code examples
python-polars

transforming multiple columns using loop


I am trying to transform groups of columns based on certain conditions. I am having trouble looping over groups of columns in my select statement. Simplified example (actual df has many more columns in avars and bvars):

df = pl.DataFrame(
    {'a': [0,1,2,4],
     'b': [1,0,3,5],
     'w': [4,7,5,8],
     'x': [10, 20, 25, 30],
     'y': [15,3,16,88],
     'z': [22,17,4,32]
     }
)
avars=['w','x']
bvars=['y','z']

This works:

df.select(
    (pl.when(pl.col('a')>0)
    .then(pl.col(var)/pl.col('a')) 
    .otherwise(pl.col(var)) for var in avars),
)

But I get an error message when I try

df.select(
    (pl.when(pl.col('a')>0)
    .then(pl.col(var)/pl.col('a')) 
    .otherwise(pl.col(var)) for var in avars),
    (pl.when(pl.col('b')>0)
    .then(pl.col(var)/pl.col('b')) 
    .otherwise(pl.col(var)) for var in bvars),
)

Even this fails:

df.select(
    pl.col('a'),
    pl.col('b'),
    (pl.when(pl.col('a')>0)
    .then(pl.col(var)/pl.col('a')) 
    .otherwise(pl.col(var)) for var in avars),
)

It seems that selecting any columns in addition to the group of columns in the for loop causes an error message. Can someone point me in the right direction?


Solution

  • You need to unpack the comprehension with a * as select won't accept a generator when other values are present.

    import polars as pl
    
    df = pl.DataFrame({"a": [0, 3], "b": [3, 4], "c": [5, 6], "d": [7, 9]})
    
    avars = ["c", "d"]
    
    df = df.select(
        pl.col("a"),
        pl.col("b"),
        *(
            pl.when(pl.col("a") > 0).then(pl.col(var) / pl.col("a")).otherwise(pl.col(var))
            for var in avars
        ),
    )
    
    print(df)
    

    Results:

    shape: (2, 4)
    ┌─────┬─────┬─────┬─────┐
    │ a   ┆ b   ┆ c   ┆ d   │
    │ --- ┆ --- ┆ --- ┆ --- │
    │ i64 ┆ i64 ┆ f64 ┆ f64 │
    ╞═════╪═════╪═════╪═════╡
    │ 0   ┆ 3   ┆ 5.0 ┆ 7.0 │
    │ 3   ┆ 4   ┆ 2.0 ┆ 3.0 │
    └─────┴─────┴─────┴─────┘