Search code examples
pythonpython-polars

Compute percentage of positive rows in a group_by polars DataFrame


I need to compute the percentage of positive values in the value column grouped by the group column.

import polars as pl

df = pl.DataFrame(
    {
        "group": ["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"],
        "value": [2, -1, 3, 1, -2, 1, 2, -1, 3, 2],
    }
)

shape: (10, 2)
┌───────┬───────┐
│ group ┆ value │
│ ---   ┆ ---   │
│ str   ┆ i64   │
╞═══════╪═══════╡
│ A     ┆ 2     │
│ A     ┆ -1    │
│ A     ┆ 3     │
│ A     ┆ 1     │
│ A     ┆ -2    │
│ B     ┆ 1     │
│ B     ┆ 2     │
│ B     ┆ -1    │
│ B     ┆ 3     │
│ B     ┆ 2     │
└───────┴───────┘

In group A there are 3 out of 5 positive values (60%), while in column B there are 4 out 5 positive values (80%).

Here's the expected dataframe.

┌────────┬──────────────────┐
│ group  ┆ positive_percent │
│ ---    ┆ ---              │
│ str    ┆ f64              │
╞════════╪══════════════════╡
│ A      ┆ 0.6              │
│ B      ┆ 0.8              │
└────────┴──────────────────┘

Solution

  • You could use a custom group_by.agg with Expr.ge and Expr.mean. This will convert the values to False/True depending on the sign, then compute the proportion of True by taking the mean:

    df.group_by('group').agg(positive_percent=pl.col('value').ge(0).mean())
    

    Output:

    ┌───────┬──────────────────┐
    │ group ┆ positive_percent │
    │ ---   ┆ ---              │
    │ str   ┆ f64              │
    ╞═══════╪══════════════════╡
    │ A     ┆ 0.6              │
    │ B     ┆ 0.8              │
    └───────┴──────────────────┘
    

    Intermediates:

    ┌───────┬───────┬───────┬──────┐
    │ group ┆ value ┆ ge(0) ┆ mean │
    │ ---   ┆ ---   ┆ ---   ┆ ---  │
    │ str   ┆ i64   ┆ bool  ┆ f64  │
    ╞═══════╪═══════╪═══════╪══════╡
    │ A     ┆ 2     ┆ true  ┆ 0.6  │ #
    │ A     ┆ -1    ┆ false ┆ 0.6  │ # group A
    │ A     ┆ 3     ┆ true  ┆ 0.6  │ # (True+False+True+True+False)/5
    │ A     ┆ 1     ┆ true  ┆ 0.6  │ # = 3/5 = 0.6
    │ A     ┆ -2    ┆ false ┆ 0.6  │ #
    │ B     ┆ 1     ┆ true  ┆ 0.8  │
    │ B     ┆ 2     ┆ true  ┆ 0.8  │
    │ B     ┆ -1    ┆ false ┆ 0.8  │
    │ B     ┆ 3     ┆ true  ┆ 0.8  │
    │ B     ┆ 2     ┆ true  ┆ 0.8  │
    └───────┴───────┴───────┴──────┘