Search code examples
python-polars

Unable to apply user defined function on polars dataframe to create new variables


I am trying to create new variables by applying a lambda function but I am getting an error.

I have tried the below code:

import polars as pl

df = pl.DataFrame({
    "a": [1, 8, 3],
    "b": [4, 5, 9]
})
df.with_columns(
    pl.all().map_elements(lambda x: (x - pl.mean(x)) / x)
      .name.suffix('_x')
)

And get the following error:

ComputeError: TypeError: invalid input for `col`

Expected `str` or `DataType`, got 'int'.

Solution

  • pl.mean is shorthand for pl.col().mean

    >>> pl.mean('foo')
    <Expr ['col("foo").mean()'] at 0x136610DC0>
    
    >>> pl.col('foo').mean()
    <Expr ['col("foo").mean()'] at 0x1368B52A0>
    

    But you are trying to use it on values inside a custom Python function, so pl.mean(x) ends up trying to call pl.mean(1) which makes no sense to Polars.

    You can perform the task using Polars Expressions directly without the need for a UDF:

    df.with_columns(
       (pl.all() - pl.all().mean())
          .name.suffix('_x')
    )
    
    shape: (3, 4)
    ┌─────┬─────┬──────┬──────┐
    │ a   ┆ b   ┆ a_x  ┆ b_x  │
    │ --- ┆ --- ┆ ---  ┆ ---  │
    │ i64 ┆ i64 ┆ f64  ┆ f64  │
    ╞═════╪═════╪══════╪══════╡
    │ 1   ┆ 4   ┆ -3.0 ┆ -2.0 │
    │ 8   ┆ 5   ┆ 4.0  ┆ -1.0 │
    │ 3   ┆ 9   ┆ -1.0 ┆ 3.0  │
    └─────┴─────┴──────┴──────┘