I am working on a dataframe with the following structure:
df = pl.DataFrame({
"datetime": [
"2024-09-24 00:00",
"2024-09-24 01:020",
"2024-09-24 02:00",
"2024-09-24 03:00",
],
"Bucket1": [2.5, 8, 0.7, 12],
"Bucket2": [3.7, 10.1, 25.9, 9.9],
"Bucket3": [40.0, 15.5, 10.7, 56],
})
My goal is to output a table that counts the number of times a group of values appears across my dataset, something like this:
shape: (4, 2)
┌───────────────────┬──────┐
│ datetime ┆ 0-10 │
│ --- ┆ --- │
│ str ┆ u32 │
╞═══════════════════╪══════╡
│ 2024-09-24 00:00 ┆ 2 │
│ 2024-09-24 01:020 ┆ 1 │
│ 2024-09-24 02:00 ┆ 1 │
│ 2024-09-24 03:00 ┆ 1 │
└───────────────────┴──────┘
I have tried a couple approaches, like using pl.when
together with .is_between
to do something like when (Bucket1.is_between(0, 10, closed="left") | Bucket1.is_between(0, 10, closed="left")) then (1)
But the result just evaluates to 1 regardless of how many Buckets evaluate to True.
and also using concat list
columns = ["Bucket1", "Bucket2", "Bucket3"]
df.with_columns(
pl.concat_list(
[pl.col(col).is_between(0,10,closed="left") for col in columns]
)
.arr.sum()
.alias("0-10")
)
The first approach didn't work as it just returns a list of booleans. The second one errors out with Invalid input for "col", Expected iterable of type "str" or "DataType", got iterable of "Expr"
How could I tackle this problem using Polar?
In the latest version 1.8.1 of Polars, your code runs as expected after replacing the arr
namespace with the list
namespace.
Moreover, it can be simplified to avoid the list comprehension as follows.
cols = ["Bucket1", "Bucket2", "Bucket3"]
df.with_columns(
pl.concat_list(pl.col(cols).is_between(0, 10, closed="left")).list.sum().alias("0-10")
)
shape: (4, 5)
┌───────────────────┬─────────┬─────────┬─────────┬──────┐
│ datetime ┆ Bucket1 ┆ Bucket2 ┆ Bucket3 ┆ 0-10 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 ┆ f64 ┆ u32 │
╞═══════════════════╪═════════╪═════════╪═════════╪══════╡
│ 2024-09-24 00:00 ┆ 2.5 ┆ 3.7 ┆ 40.0 ┆ 2 │
│ 2024-09-24 01:020 ┆ 8.0 ┆ 10.1 ┆ 15.5 ┆ 1 │
│ 2024-09-24 02:00 ┆ 0.7 ┆ 25.9 ┆ 10.7 ┆ 1 │
│ 2024-09-24 03:00 ┆ 12.0 ┆ 9.9 ┆ 56.0 ┆ 1 │
└───────────────────┴─────────┴─────────┴─────────┴──────┘