I'm trying to find the equivalent of a min_count
param on polars groupby, such as in pandas.groupby(key).sum(min_count=N)
.
Let's suppose the dataframe
┌───────┬───────┐
│ fruit ┆ price │
│ --- ┆ --- │
│ str ┆ i64 │
╞═══════╪═══════╡
│ a ┆ 1 │
│ a ┆ 3 │
│ a ┆ 5 │
│ b ┆ 10 │
│ b ┆ 10 │
│ b ┆ 10 │
│ b ┆ 20 │
└───────┴───────┘
How can I groupby through the fruit
key with the constrain of the group having at least 4 values for the sum?
So instead of
┌───────┬───────┐
│ fruit ┆ price │
│ --- ┆ --- │
│ str ┆ i64 │
╞═══════╪═══════╡
│ b ┆ 50 │
│ a ┆ 9 │
└───────┴───────┘
I'd have only fruit b
on the output, since it's the only one with at least 4 elements
┌───────┬───────┐
│ fruit ┆ price │
│ --- ┆ --- │
│ str ┆ i64 │
╞═══════╪═══════╡
│ b ┆ 50 │
└───────┴───────┘
I don't think there's a built-in min_count
for this, but you can just filter:
(
df.group_by("fruit")
.agg(pl.col("price").sum(), pl.count())
.filter(pl.col("count") >= 4)
.drop("count")
)
shape: (1, 2)
┌───────┬───────┐
│ fruit ┆ price │
│ --- ┆ --- │
│ str ┆ i64 │
╞═══════╪═══════╡
│ b ┆ 50 │
└───────┴───────┘