Search code examples
pythondataframemachine-learningpoissonpython-polars

How to Write Poisson CDF as Python Polars Expression


I have a collection of polars expressions being used to generate features for an ML model. I'd like to add a poission cdf feature to this collection whilst maintaining lazy execution (with benefits of speed, caching etc...). I so far have not found an easy way of achieving this.

I've been able to get the result I'd like outside of the desired lazy expression framework with:

import polars as pl
from scipy.stats import poisson

df = pl.DataFrame({"count": [9,2,3,4,5], "expected_count": [7.7, 0.2, 0.7, 1.1, 7.5]})
result = poisson.cdf(df["count"].to_numpy(), df["expected_count"].to_numpy())
df = df.with_columns(pl.Series(result).alias("poisson_cdf"))

However, in reality I'd like this to look like:

df = pl.DataFrame({"count": [9,2,3,4,5], "expected_count": [7.7, 0.2, 0.7, 1.1, 7.5]})
df = df.select(
    [
        ... # bunch of other expressions here
        poisson_cdf()
    ]
)

where poisson_cdf is some polars expression like:

def poisson_cdf():
    # this is just for illustration, clearly wont work
    return scipy.stats.poisson.cdf(pl.col("count"), pl.col("expected_count")).alias("poisson_cdf")

I also tried using a struct made up of "count" and "expected_count" and apply like advised in the docs when applying custom functions. However, my dataset is several millions of rows in reality - leading to absurd execution time.

Any advice or guidance here would be appreciated. Ideally there exists an expression like this somewhere out there? Thanks in advance!


Solution

  • It sounds like you want to use .map_batches()

    df.with_columns(
       pl.struct("count", "expected_count")
         .map_batches(lambda x: 
            poisson.cdf(x.struct.field("count"), x.struct.field("expected_count"))
         )
         .alias("poisson_cdf")
    )
    
    shape: (5, 3)
    ┌───────┬────────────────┬─────────────┐
    │ count | expected_count | poisson_cdf │
    │ ---   | ---            | ---         │
    │ i64   | f64            | f64         │
    ╞═══════╪════════════════╪═════════════╡
    │ 9     | 7.7            | 0.75308     │
    │ 2     | 0.2            | 0.998852    │
    │ 3     | 0.7            | 0.994247    │
    │ 4     | 1.1            | 0.994565    │
    │ 5     | 7.5            | 0.241436    │
    └───────┴────────────────┴─────────────┘