How do you "pipe" an expression in Polars?
Consider this code:
def transformation(col:pl.Series)->pl.Series:
return col.tanh().name.suffix('_tanh')
It'd be nice to be able to do this:
df.with_columns(
pl.col('colA').pipe(transformation),
pl.col('colB').pipe(transformation),
pl.col('colC').pipe(transformation),
pl.col('colD').pipe(transformation),
)
But I don't think Polars supports .pipe for Series / expressions.
The alternative is
df.with_columns(
transformation(pl.col('colA')),
transformation(pl.col('colB')),
transformation(pl.col('colC')),
transformation(pl.col('colD')),
)
But this gets messy (IMO) when you have arguments to the transformation
function
I implemented this and it "works" for me
def _pipe(self, func, *args, **kwargs):
return func(self, *args, **kwargs)
pl.Expr.pipe = _pipe
Typically (like pandas) you'd apply pipe
at the DataFrame level.
Especially in conjunction with lazy-eval, this would be equivalent to chaining expressions; your function will receive the underlying eager/lazy frame, along with any optional *args and **kwargs, and by making it lazy()
you ensure that your chain of operations can still take advantage of the query optimiser and parallelisation.
For example:
import polars as pl
# define some UDFs
def extend_with_tan( df ):
return df.with_columns( pl.all().tanh().name.suffix("_tanh") )
def mul_in_place( df, n ):
return df.select( (pl.all() * n).name.suffix(f"_x{n}") )
# init lazyframe
df = pl.DataFrame({
"colA": [-4],
"colB": [-2],
"colC": [10],
}).lazy()
# pipe/result
dfx = df.pipe( extend_with_tan ).pipe( mul_in_place,n=3 )
dfx.collect()
# ┌─────────┬─────────┬─────────┬──────────────┬──────────────┬──────────────┐
# │ colA_x3 ┆ colB_x3 ┆ colC_x3 ┆ colA_tanh_x3 ┆ colB_tanh_x3 ┆ colC_tanh_x3 │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 ┆ f64 ┆ f64 ┆ f64 │
# ╞═════════╪═════════╪═════════╪══════════════╪══════════════╪══════════════╡
# │ -12 ┆ -6 ┆ 30 ┆ -2.997988 ┆ -2.892083 ┆ 3.0 │
# └─────────┴─────────┴─────────┴──────────────┴──────────────┴──────────────┘