Search code examples
pythonpython-polars

Polars: use value from column as column name in when then expression


In a polars dataframe I have a column that contains the names of other columns (column "id_column_name"). I want to use those names in a when-then expression with pl.col() to create a new column ("id") which gets its values out of these other columns ("id_column1", "id_column2"). Every row can gets its value from another column in the df.

# initial df
df = pl.DataFrame({
        'id_column1': [123, 456],
        'id_column2': ['abc', 'def'],
        'id_column_name': ['id_column1', 'id_column2']
})

# required output df
df_out = pl.DataFrame({
    'id_column1': [123, 456],
    'id_column2': ['abc', 'def'],
    'id_column_name': ['id_column1', 'id_column2'],
    'id': ['123', 'def'] 
})

# one of the trings I tried
df = df.with_columns(
    pl.when(pl.col('id_column_name').is_not_null())
    .then(pl.col(pl.col('id_column_name')))
    .otherwise(None)
    .cast(pl.String)
    .alias('id')
)

This leads to the error: Expected str or DataType, got 'Expr'.

Using str() or .str to create the expression into a regular string lead to other errors:

Expected str or DataType, got 'ExprStringNameSpace'.

This cant be that hard, can it?


Solution

  • df.with_columns(
        id = pl.coalesce(
            pl.when(pl.col.id_column_name == col).then(pl.col(col))
            for col in df.schema if col not in ("id_column_name")
            # or this if amount of columns is large
            # for col in df["id_column_name"].unique()
        )
    )
    
    shape: (2, 4)
    ┌────────────┬────────────┬────────────────┬─────┐
    │ id_column1 ┆ id_column2 ┆ id_column_name ┆ id  │
    │ ---        ┆ ---        ┆ ---            ┆ --- │
    │ i64        ┆ str        ┆ str            ┆ str │
    ╞════════════╪════════════╪════════════════╪═════╡
    │ 123        ┆ abc        ┆ id_column1     ┆ 123 │
    │ 456        ┆ def        ┆ id_column2     ┆ def │
    └────────────┴────────────┴────────────────┴─────┘