Search code examples
python-polars

How to get max string per group in Polars?


I please need help returning the maximum string in a column for a given group of columns in Polars. I have an example of the process working for maximum number, but not for string.

import polars as pl

pl.__version__
# '0.15.2'

d_int = pl.DataFrame({
    "g": ["a", "a", "b"],
    "v": [1, 2, 3],
})

# works
(
    d_int
        .group_by("g")
        .agg(
            pl.col("v").min().alias("v_min"), 
            pl.col("v").max().alias("v_max")
            )
)


d_str = pl.DataFrame({
    "g": ["a", "a", "b"],
    "v": ["x", "y", "x"],
})


# returns nulls
(
    d_str
        .group_by("g")
        .agg(
            pl.col("v").min().alias("v_min"), 
            pl.col("v").max().alias("v_max")
            )
)

The first calculation works, but the second returns the following (with nulls).

┌─────┬───────┬───────┐
│ g   ┆ v_min ┆ v_max │
│ --- ┆ ---   ┆ ---   │
│ str ┆ str   ┆ str   │
╞═════╪═══════╪═══════╡
│ a   ┆ null  ┆ null  │
│ b   ┆ null  ┆ null  │


Solution

  • This was added in polars==0.15.3 and should work as expected.