Not sure why I am finding this so hard.
For the following dataframe I want to calculate the mean for the grouped months like
df = df.group_by("month", maintain_order=True).mean()
However, I want to ignore 0.0's
I am trying to use polars.Expr.replace but can't find an example that does not a existing columns etc
How can I replace all the 0.0 in the data frame with say None or NaN so the mean will ignore them ?
┌─────────────────────────┬────────┬────────┬────────┬────────┬───┬────────┬────────┬────────┬────────┬─────────┐
│ ts ┆ 642935 ┆ 643128 ┆ 642929 ┆ 642930 ┆ … ┆ 642932 ┆ 642916 ┆ 642933 ┆ 643129 ┆ month │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[ns, UTC] ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ str │
╞═════════════════════════╪════════╪════════╪════════╪════════╪═══╪════════╪════════╪════════╪════════╪═════════╡
│ 2024-02-13 00:00:00 UTC ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 41.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 02-2024 │
│ 2024-02-13 00:05:00 UTC ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 02-2024 │
│ 2024-02-13 00:10:00 UTC ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ … ┆ 0.0 ┆ 44.0 ┆ 0.0 ┆ 0.0 ┆ 02-2024 │
│ 2024-02-13 00:15:00 UTC ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 02-2024 │
You can exclude them during the aggregation.
import polars as pl
import polars.selectors as cs
df = pl.DataFrame({
"foo": [0.0, 1.0, 2.0],
"bar": [0.0, 1.0, 0.0],
"groups": ["g", "h", "h"]
})
(
df.group_by("groups")
.agg(
cs.numeric().filter(cs.numeric() != 0.0).mean()
)
)
Outputs:
shape: (2, 3)
┌────────┬──────┬──────┐
│ groups ┆ foo ┆ bar │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 │
╞════════╪══════╪══════╡
│ g ┆ null ┆ null │
│ h ┆ 1.5 ┆ 1.0 │
└────────┴──────┴──────┘