polars ignore zeros when doing mean()

Not sure why I am finding this so hard.

For the following dataframe I want to calculate the mean for the grouped months like

df = df.group_by("month", maintain_order=True).mean()

However, I want to ignore 0.0's

I am trying to use polars.Expr.replace but can't find an example that does not a existing columns etc

How can I replace all the 0.0 in the data frame with say None or NaN so the mean will ignore them ?

┌─────────────────────────┬────────┬────────┬────────┬────────┬───┬────────┬────────┬────────┬────────┬─────────┐
│ ts                      ┆ 642935 ┆ 643128 ┆ 642929 ┆ 642930 ┆ … ┆ 642932 ┆ 642916 ┆ 642933 ┆ 643129 ┆ month   │
│ ---                     ┆ ---    ┆ ---    ┆ ---    ┆ ---    ┆   ┆ ---    ┆ ---    ┆ ---    ┆ ---    ┆ ---     │
│ datetime[ns, UTC]       ┆ f64    ┆ f64    ┆ f64    ┆ f64    ┆   ┆ f64    ┆ f64    ┆ f64    ┆ f64    ┆ str     │
╞═════════════════════════╪════════╪════════╪════════╪════════╪═══╪════════╪════════╪════════╪════════╪═════════╡
│ 2024-02-13 00:00:00 UTC ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ 41.0   ┆ … ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ 02-2024 │
│ 2024-02-13 00:05:00 UTC ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ … ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ 02-2024 │
│ 2024-02-13 00:10:00 UTC ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ … ┆ 0.0    ┆ 44.0   ┆ 0.0    ┆ 0.0    ┆ 02-2024 │
│ 2024-02-13 00:15:00 UTC ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ … ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ 0.0    ┆ 02-2024 │

Solution

You can exclude them during the aggregation.

import polars as pl
import polars.selectors as cs

df = pl.DataFrame({
    "foo": [0.0, 1.0, 2.0],
    "bar": [0.0, 1.0, 0.0],
    "groups": ["g", "h", "h"]
})
(
    df.group_by("groups")
     .agg(
         cs.numeric().filter(cs.numeric() != 0.0).mean()
     )
)

Outputs:

shape: (2, 3)
┌────────┬──────┬──────┐
│ groups ┆ foo  ┆ bar  │
│ ---    ┆ ---  ┆ ---  │
│ str    ┆ f64  ┆ f64  │
╞════════╪══════╪══════╡
│ g      ┆ null ┆ null │
│ h      ┆ 1.5  ┆ 1.0  │
└────────┴──────┴──────┘