Search code examples
pythondataframepython-polars

Polars: Concatenating column names depending on value


I have a table as follows:

    a   b   c   d   e
    0   1   0   1   0
    1   0   0   0   0

I want to create a column RESULT that is a concatenation of column names only if the row has a value of 1.

    a   b   c   d   e    RESULT
    0   1   0   1   0    bd   
    1   0   0   0   0    a

Whats the most efficient way of doing this with polars?

I can do this via a map_elements, but I wonder if there is a more efficient way.


Solution

  • The general approach is usually to use when/then and loop over .columns

    In this case you want the column name or an empty string.

    You can pass this directly to .concat_str() to combine the result.

    df = pl.DataFrame({
       'a': [0, 1], 'b': [1, 0], 'c': [0, 0], 'd': [1, 0], 'e': [0, 0]
    })
    
    df.with_columns(RESULT =
       pl.concat_str(
          pl.when(pl.col(col) == 1).then(pl.lit(col)).fill_null("")
          for col in df.columns
       )
    )
    
    shape: (2, 6)
    ┌─────┬─────┬─────┬─────┬─────┬────────┐
    │ a   ┆ b   ┆ c   ┆ d   ┆ e   ┆ RESULT │
    │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ ---    │
    │ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 ┆ str    │
    ╞═════╪═════╪═════╪═════╪═════╪════════╡
    │ 0   ┆ 1   ┆ 0   ┆ 1   ┆ 0   ┆ bd     │
    │ 1   ┆ 0   ┆ 0   ┆ 0   ┆ 0   ┆ a      │
    └─────┴─────┴─────┴─────┴─────┴────────┘