Search code examples
python-polars

How do you access a column name in a polars expression?


I implemented a sigmoid transformation in polars as follows:

def sigmoid(c:pl.Expr)->pl.Expr:
    return 1 / ((-c).exp() + 1)

This works great, except that by polars naming conventions the resulting column is called 'literal'

I could keep the column name by re-writing sigmoid as

def sigmoid(c:pl.Expr)->pl.Expr:
    return ((c * -1).exp() + 1)**-1

But: A. That is horrible B. I don't want my code to have this "magical/invisible" tracking of column names

What I'd like to do is add a .alias() at the end of my function to ensure the column name is preserved.

The following pseudo-code expresses the idea:

def sigmoid(c:pl.Expr)->pl.Expr:
    return (1 / ((-c).exp() + 1)).alias(c.name)

However, polars expressions do not have a .name attribute.

How else could I keep the column name?

Not that I could do:

df.select(
   pl.col('a').pipe(sigmoid).alias('a'), 
   pl.col('b').pipe(sigmoid).alias('b'), 
   pl.col('c').pipe(sigmoid).alias('c'), 
   ...
)

But that is cumbersome, and would not work well with

df.select(
   pl.all().pipe(sigmoid)
)

Solution

  • Update: You can use your original function along with .name.keep() to prevent DuplicateError

    df.with_columns(pl.all().pipe(sigmoid).name.keep())
    
    shape: (3, 3)
    ┌──────────┬──────────┬──────────┐
    │ a        ┆ b        ┆ c        │
    │ ---      ┆ ---      ┆ ---      │
    │ f64      ┆ f64      ┆ f64      │
    ╞══════════╪══════════╪══════════╡
    │ 0.731059 ┆ 0.982014 ┆ 0.999089 │
    │ 0.880797 ┆ 0.993307 ┆ 0.999665 │
    │ 0.952574 ┆ 0.997527 ┆ 0.999877 │
    └──────────┴──────────┴──────────┘
    

    Original answer.

    There are introspection methods such as .output_name in the meta namespace.

    pl.col("a").meta.output_name()
    
    # 'a'
    
    pl.when(pl.col("a") == 1).then(pl.col("b")).otherwise(pl.col("c")).meta.output_name()
    
    # 'b'
    
    pl.when(pl.col("a") == 1).then(pl.col("b")).otherwise(pl.col("c")).meta.root_names()
    
    # ['b', 'c', 'a']
    
    def sigmoid(c:pl.Expr)->pl.Expr:
       return (1 / ((-c).exp() + 1)).alias(c.meta.output_name())
    
    pl.DataFrame(dict(a = [1, 2, 3])).select(sigmoid(pl.col("a")))
    
    shape: (3, 1)
    ┌──────────┐
    │ a        │
    │ ---      │
    │ f64      │
    ╞══════════╡
    │ 0.731059 │
    │ 0.880797 │
    │ 0.952574 │
    └──────────┘