Search code examples
pythonpython-polars

Polars: change a value in a dataframe if a condition is met in another column


I have this dataframe

one two
a hola
b world

And I want to change hola for hello:

one two
a hello
b world

How can I change the values of a row based on a condition in another column?

For instance, with PostgreSQL I could do this:

UPDATE my_table SET two = 'hello' WHERE one = 'a';

Or in Spark

my_table.withColumn("two", when(col("one") == "a", "hello"))

I've tried using with_columns(pl.when(pl.col("one") == "a").then("hello")) but that changes the column "one".

EDIT: I could create a SQL instance and plot my way through via SQL but there must be way to achieve this via the Python API.


Solution

  • You were really close with with_columns(pl.when(pl.col("one") == "a").then("hello")) but you needed to tell it which column that should be.

    When you don't tell it which column you're referring to then it has to guess and in this case it guessed the column you referred to.

    Instead you do

    (df 
        .with_columns(
            two=pl.when(pl.col('one')=='a')
                    .then(pl.lit('hello'))
                    .otherwise(pl.col('two')))
    )
    

    This uses the **kwargs input of with_columns to allow the column named to be on the left of an equal sign as though it were a parameter to a function. You can also use alias syntax like this...

    (df 
        .with_columns(
            (pl.when(pl.col('one')=='a')
                    .then(pl.lit('hello'))
                    .otherwise(pl.col('two')))
                .alias('two')
                    )
    )
    

    Note that I wrapped the entire when/then/otherwise in parenthesis. The order of operations around when/then/otherwise and alias is weird so I find it's best to always completely wrap them in parenthesis to avoid unexpected results. Worst case scenario is you have redundant parenthesis which doesn't hurt anything.