Search code examples
pythondataframepython-polars

Score number of "True" instances with python-polars


I am working on a dataframe with the following structure:

df = pl.DataFrame({
    "datetime": [
        "2024-09-24 00:00",
        "2024-09-24 01:020",
        "2024-09-24 02:00",
        "2024-09-24 03:00",
    ],
    "Bucket1": [2.5, 8, 0.7, 12],
    "Bucket2": [3.7, 10.1, 25.9, 9.9],
    "Bucket3": [40.0, 15.5, 10.7, 56],
})

My goal is to output a table that counts the number of times a group of values appears across my dataset, something like this:

shape: (4, 2)
┌───────────────────┬──────┐
│ datetime          ┆ 0-10 │
│ ---               ┆ ---  │
│ str               ┆ u32  │
╞═══════════════════╪══════╡
│ 2024-09-24 00:00  ┆ 2    │
│ 2024-09-24 01:020 ┆ 1    │
│ 2024-09-24 02:00  ┆ 1    │
│ 2024-09-24 03:00  ┆ 1    │
└───────────────────┴──────┘

I have tried a couple approaches, like using pl.when together with .is_between to do something like when (Bucket1.is_between(0, 10, closed="left") | Bucket1.is_between(0, 10, closed="left")) then (1)

But the result just evaluates to 1 regardless of how many Buckets evaluate to True.

and also using concat list

columns = ["Bucket1", "Bucket2", "Bucket3"]
df.with_columns(
    pl.concat_list(
        [pl.col(col).is_between(0,10,closed="left") for col in columns]
    )
    .arr.sum()
    .alias("0-10")
)

The first approach didn't work as it just returns a list of booleans. The second one errors out with Invalid input for "col", Expected iterable of type "str" or "DataType", got iterable of "Expr"

How could I tackle this problem using Polar?


Solution

  • In the latest version 1.8.1 of Polars, your code runs as expected after replacing the arr namespace with the list namespace.

    Moreover, it can be simplified to avoid the list comprehension as follows.

    cols = ["Bucket1", "Bucket2", "Bucket3"]
    
    df.with_columns(
        pl.concat_list(pl.col(cols).is_between(0, 10, closed="left")).list.sum().alias("0-10")
    )
    
    shape: (4, 5)
    ┌───────────────────┬─────────┬─────────┬─────────┬──────┐
    │ datetime          ┆ Bucket1 ┆ Bucket2 ┆ Bucket3 ┆ 0-10 │
    │ ---               ┆ ---     ┆ ---     ┆ ---     ┆ ---  │
    │ str               ┆ f64     ┆ f64     ┆ f64     ┆ u32  │
    ╞═══════════════════╪═════════╪═════════╪═════════╪══════╡
    │ 2024-09-24 00:00  ┆ 2.5     ┆ 3.7     ┆ 40.0    ┆ 2    │
    │ 2024-09-24 01:020 ┆ 8.0     ┆ 10.1    ┆ 15.5    ┆ 1    │
    │ 2024-09-24 02:00  ┆ 0.7     ┆ 25.9    ┆ 10.7    ┆ 1    │
    │ 2024-09-24 03:00  ┆ 12.0    ┆ 9.9     ┆ 56.0    ┆ 1    │
    └───────────────────┴─────────┴─────────┴─────────┴──────┘