I am trying to create new variables by applying a lambda
function but I am getting an error.
I have tried the below code:
import polars as pl
df = pl.DataFrame({
"a": [1, 8, 3],
"b": [4, 5, 9]
})
df.with_columns(
pl.all().map_elements(lambda x: (x - pl.mean(x)) / x)
.name.suffix('_x')
)
And get the following error:
ComputeError: TypeError: invalid input for `col`
Expected `str` or `DataType`, got 'int'.
pl.mean
is shorthand for pl.col().mean
>>> pl.mean('foo')
<Expr ['col("foo").mean()'] at 0x136610DC0>
>>> pl.col('foo').mean()
<Expr ['col("foo").mean()'] at 0x1368B52A0>
But you are trying to use it on values inside a custom Python function, so pl.mean(x)
ends up trying to call pl.mean(1)
which makes no sense to Polars.
You can perform the task using Polars Expressions directly without the need for a UDF:
df.with_columns(
(pl.all() - pl.all().mean())
.name.suffix('_x')
)
shape: (3, 4)
┌─────┬─────┬──────┬──────┐
│ a ┆ b ┆ a_x ┆ b_x │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ f64 ┆ f64 │
╞═════╪═════╪══════╪══════╡
│ 1 ┆ 4 ┆ -3.0 ┆ -2.0 │
│ 8 ┆ 5 ┆ 4.0 ┆ -1.0 │
│ 3 ┆ 9 ┆ -1.0 ┆ 3.0 │
└─────┴─────┴──────┴──────┘