I please need help returning the maximum string in a column for a given group of columns in Polars. I have an example of the process working for maximum number, but not for string.
import polars as pl
pl.__version__
# '0.15.2'
d_int = pl.DataFrame({
"g": ["a", "a", "b"],
"v": [1, 2, 3],
})
# works
(
d_int
.group_by("g")
.agg(
pl.col("v").min().alias("v_min"),
pl.col("v").max().alias("v_max")
)
)
d_str = pl.DataFrame({
"g": ["a", "a", "b"],
"v": ["x", "y", "x"],
})
# returns nulls
(
d_str
.group_by("g")
.agg(
pl.col("v").min().alias("v_min"),
pl.col("v").max().alias("v_max")
)
)
The first calculation works, but the second returns the following (with nulls).
┌─────┬───────┬───────┐
│ g ┆ v_min ┆ v_max │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════╪═══════╪═══════╡
│ a ┆ null ┆ null │
│ b ┆ null ┆ null │
This was added in polars==0.15.3
and should work as expected.