Search code examples
pythonpython-polars

Polars Case Statement


I am trying to pick up the package polars from Python. I come from an R background so appreciate this might be an incredibly easy question.

I want to implement a case statement where if any of the conditions below are true, it will flag it to 1 otherwise it will be 0. My new column will be called 'my_new_column_flag'

I am however getting the error message

Traceback (most recent call last): File "", line 2, in File "C:\Users\foo\Miniconda3\envs\env\lib\site-packages\polars\internals\lazy_functions.py", line 204, in col return pli.wrap_expr(pycol(name)) TypeError: argument 'name': 'int' object cannot be converted to 'PyString'

import polars as pl
import numpy as np

np.random.seed(12)

df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names": ["foo", "ham", "spam", "egg", None],
        "random": np.random.rand(5),
        "groups": ["A", "A", "B", "C", "B"],
    }
)
print(df)

df.with_columns(
    pl.when(pl.col('nrs') == 1).then(pl.col(1))
    .when(pl.col('names') == 'ham').then(pl.col(1))
    .when(pl.col('random') == 0.014575).then(pl.col(1))
    .otherwise(pl.col(0))
    .alias('my_new_column_flag')
)

Can anyone help?


Solution

  • pl.col selects a column with the given name (as string). What you want is a column with literal value set to one: pl.lit(1)

    df.with_columns(
        pl.when(pl.col('nrs') == 1).then(pl.lit(1))
        .when(pl.col('names') == 'ham').then(pl.lit(1))
        .when(pl.col('random') == 0.014575).then(pl.lit(1))
        .otherwise(pl.lit(0))
        .alias('my_new_column_flag')
    )
    

    PS: it may look more natural to use predicate for your flat (and cast it to int if you want it to be 0/1 instead of true/false):

    
    df.with_columns(
        ((pl.col("nrs") == 1) | (pl.col("names") == "ham") | (pl.col("random") == 0.014575))
        .alias("my_new_column_flag")
        .cast(int)
    )