How can I calculate the elementwise maximum of two columns in Polars inside an expression?
Polars version = 0.13.31
Problem statement as code:
import polars as pl
import numpy as np
df = pl.DataFrame({
"a": np.arange(5),
"b": np.arange(5)[::-1]
})
# Produce a column with the values [4, 3, 2, 3, 4] using df.select([ ... ]).alias("max(a, b)")
Polars claims to support numpy universal functions (docs), which includes np.maximum which does what I'm asking for. However when I try that I get an error.
df.select([
np.maximum(pl.col("a"), pl.col("b")).alias("max(a, b)")
])
# TypeError: maximum() takes from 2 to 3 positional arguments but 1 were given
There appears to be no Polars builtin for this, there is pl.max
but this returns only the single maximum element in an array.
Using .map()
my_df.select([
pl.col(["a", "b"]).map(np.maximum)
])
# PanicException
I'm able to do this using the following snippet however I want to be able to do this inside an expresion as it's much more convenient.
df["max(a, b)"] = np.maximum(df["a"], df["b"])
You can use .max_horizontal()
df = pl.select(
a = pl.int_range(0, 5),
b = pl.int_range(0, 5).reverse(),
)
shape: (5, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 0 ┆ 4 │
│ 1 ┆ 3 │
│ 2 ┆ 2 │
│ 3 ┆ 1 │
│ 4 ┆ 0 │
└─────┴─────┘
df.with_columns(
pl.max_horizontal('a', 'b').alias('max(a, b)')
)
shape: (5, 3)
┌─────┬─────┬───────────┐
│ a ┆ b ┆ max(a, b) │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═══════════╡
│ 0 ┆ 4 ┆ 4 │
│ 1 ┆ 3 ┆ 3 │
│ 2 ┆ 2 ┆ 2 │
│ 3 ┆ 1 ┆ 3 │
│ 4 ┆ 0 ┆ 4 │
└─────┴─────┴───────────┘
The API also contains other horizontal functions.