In a polars dataframe I have a column that contains the names of other columns (column "id_column_name"). I want to use those names in a when-then expression with pl.col() to create a new column ("id") which gets its values out of these other columns ("id_column1", "id_column2"). Every row can gets its value from another column in the df.
# initial df
df = pl.DataFrame({
'id_column1': [123, 456],
'id_column2': ['abc', 'def'],
'id_column_name': ['id_column1', 'id_column2']
})
# required output df
df_out = pl.DataFrame({
'id_column1': [123, 456],
'id_column2': ['abc', 'def'],
'id_column_name': ['id_column1', 'id_column2'],
'id': ['123', 'def']
})
# one of the trings I tried
df = df.with_columns(
pl.when(pl.col('id_column_name').is_not_null())
.then(pl.col(pl.col('id_column_name')))
.otherwise(None)
.cast(pl.String)
.alias('id')
)
This leads to the error: Expected
str
orDataType
, got 'Expr'.
Using str() or .str to create the expression into a regular string lead to other errors:
Expected
str
orDataType
, got 'ExprStringNameSpace'.
This cant be that hard, can it?
pl.Series.unique()
to get all possible values of id_column_name
column.pl.when()
to create conditional results.pl.coalesce()
to fill the final result with first non-empty value.df.with_columns(
id = pl.coalesce(
pl.when(pl.col.id_column_name == col).then(pl.col(col))
for col in df.schema if col not in ("id_column_name")
# or this if amount of columns is large
# for col in df["id_column_name"].unique()
)
)
shape: (2, 4)
┌────────────┬────────────┬────────────────┬─────┐
│ id_column1 ┆ id_column2 ┆ id_column_name ┆ id │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞════════════╪════════════╪════════════════╪═════╡
│ 123 ┆ abc ┆ id_column1 ┆ 123 │
│ 456 ┆ def ┆ id_column2 ┆ def │
└────────────┴────────────┴────────────────┴─────┘