Search code examples
pythonpython-polars

Polars equivalent to Pandas min_count on groupby


I'm trying to find the equivalent of a min_count param on polars groupby, such as in pandas.groupby(key).sum(min_count=N).

Let's suppose the dataframe

┌───────┬───────┐
│ fruit ┆ price │
│ ---   ┆ ---   │
│ str   ┆ i64   │
╞═══════╪═══════╡
│ a     ┆ 1     │
│ a     ┆ 3     │
│ a     ┆ 5     │
│ b     ┆ 10    │
│ b     ┆ 10    │
│ b     ┆ 10    │
│ b     ┆ 20    │
└───────┴───────┘

How can I groupby through the fruit key with the constrain of the group having at least 4 values for the sum?

So instead of

┌───────┬───────┐
│ fruit ┆ price │
│ ---   ┆ ---   │
│ str   ┆ i64   │
╞═══════╪═══════╡
│ b     ┆ 50    │
│ a     ┆ 9     │
└───────┴───────┘

I'd have only fruit b on the output, since it's the only one with at least 4 elements

┌───────┬───────┐
│ fruit ┆ price │
│ ---   ┆ ---   │
│ str   ┆ i64   │
╞═══════╪═══════╡
│ b     ┆ 50    │
└───────┴───────┘

Solution

  • I don't think there's a built-in min_count for this, but you can just filter:

    (
        df.group_by("fruit")
        .agg(pl.col("price").sum(), pl.count())
        .filter(pl.col("count") >= 4)
        .drop("count")
    )
    
    shape: (1, 2)
    ┌───────┬───────┐
    │ fruit ┆ price │
    │ ---   ┆ ---   │
    │ str   ┆ i64   │
    ╞═══════╪═══════╡
    │ b     ┆ 50    │
    └───────┴───────┘