Search code examples
pythonpython-polars

How to swap column values on conditions in python polars?


I have a data frame as below:

df = pl.DataFrame({'last_name':[None,'mallesh','bhavik'],
                   'first_name':['a','b','c'],
                   'middle_name_or_initial':['aa','bb','cc']})
shape: (3, 3)
┌───────────┬────────────┬────────────────────────┐
│ last_name ┆ first_name ┆ middle_name_or_initial │
│ ---       ┆ ---        ┆ ---                    │
│ str       ┆ str        ┆ str                    │
╞═══════════╪════════════╪════════════════════════╡
│ null      ┆ a          ┆ aa                     │
│ mallesh   ┆ b          ┆ bb                     │
│ bhavik    ┆ c          ┆ cc                     │
└───────────┴────────────┴────────────────────────┘

Here I would like to find an observation which has First and Middle Name not NULL and Last Name is Null.

In this case last_name should take the value from first_name, first_name should take the value from middle_name, and middle_name should be NULL.

expected output will be:

shape: (3, 3)
┌───────────┬────────────┬────────────────────────┐
│ last_name ┆ first_name ┆ middle_name_or_initial │
│ ---       ┆ ---        ┆ ---                    │
│ str       ┆ str        ┆ str                    │
╞═══════════╪════════════╪════════════════════════╡
│ a         ┆ aa         ┆ null                   │
│ mallesh   ┆ b          ┆ bb                     │
│ bhavik    ┆ c          ┆ cc                     │
└───────────┴────────────┴────────────────────────┘

I'm trying with this command:

df.with_columns(
    pl.when((pl.col('first_name').is_not_null()) & (pl.col('middle_name_or_initial').is_not_null()) & (pl.col('last_name').is_null()))
      .then(pl.col('first_name').alias('last_name')).otherwise(pl.col('last_name').alias('first_name')),
    pl.when((pl.col('first_name').is_not_null()) & (pl.col('middle_name_or_initial').is_not_null()) & (pl.col('last_name').is_null()))
      .then(pl.col('middle_name_or_initial').alias('first_name')).alias('middle_name_or_initial')
)

But it is not producing the expected output.


Solution

  • You can actually change the values of multiple columns within a single when/then/otherwise statement.

    The Algorithm

    name_cols = ["last_name", "first_name", "middle_name_or_initial"]
    (
        df.with_columns(
            pl.when(
                pl.col("first_name").is_not_null(),
                pl.col("middle_name_or_initial").is_not_null(),
                pl.col("last_name").is_null()
            )
            .then(pl.struct(
                last_name = "first_name",
                first_name = "middle_name_or_initial",
                middle_name_or_initial = None
            ))
            .otherwise(pl.struct(name_cols))
            .struct.field(name_cols)
        )
    )
    
    shape: (3, 3)
    ┌───────────┬────────────┬────────────────────────┐
    │ last_name ┆ first_name ┆ middle_name_or_initial │
    │ ---       ┆ ---        ┆ ---                    │
    │ str       ┆ str        ┆ str                    │
    ╞═══════════╪════════════╪════════════════════════╡
    │ a         ┆ aa         ┆ null                   │
    │ mallesh   ┆ b          ┆ bb                     │
    │ bhavik    ┆ c          ┆ cc                     │
    └───────────┴────────────┴────────────────────────┘
    

    How it works

    To change the values of multiple columns within a single when/then/otherwise statement, we can use Structs.

    We use pl.struct() to create a struct for each condition.

    df.select(
        then = pl.struct(
            last_name = "first_name",
            first_name = "middle_name_or_initial",
            middle_name_or_initial = None
        ),
        otherwise = pl.struct(name_cols)
    )
    
    shape: (3, 2)
    ┌─────────────────┬──────────────────────┐
    │ then            ┆ otherwise            │
    │ ---             ┆ ---                  │
    │ struct[3]       ┆ struct[3]            │
    ╞═════════════════╪══════════════════════╡
    │ {"a","aa",null} ┆ {null,"a","aa"}      │
    │ {"b","bb",null} ┆ {"mallesh","b","bb"} │
    │ {"c","cc",null} ┆ {"bhavik","c","cc"}  │
    └─────────────────┴──────────────────────┘
    

    .struct.field() is used to "unnest" the result and replace the original values.