Search code examples
python-polars

Python Polars - Finding min value greater than col(a) in col(b) where col(a) is a numeric and col(b) is a column of lists of numerics


How do I convert the following slow operation in pandas to a fast operation in polars?

df.to_pandas().apply(lambda x: pd.cut([x['ingame_timestamp']], list(x['time_bins']), list(x['time_bins'])[1:]), axis=1)

Assume ingame_timestamp is a float and time_bins is a list.

I basically want to be able to do something like:

df.with_columns(pl.cut(value=pl.col('val'), bins=pl.col('time_bins), labels=pl.col('time_bins')[1:]).alias('val_time_bin'))

The above code works when I use to_pandas() but obviously this loses a bunch of the speed benefits of using polars and not using apply.

The following gives you an example data frame along with a column which is the desired output:

example_df = pl.DataFrame({'values': [0,1,2], 'time_bins': [[-1, -0.5, 0.5, 1], [0, 0.5, 1.5, 2.5], [1.5, 2.5, 3, 4.5]], 'value_time_bin': [0.5, 1.5, 2.5]})

It is sufficient to find the minimum value greater than "value" in the list "time_bins".


Solution

  • # reproducible dataset
    df = pl.DataFrame({
        'values': [0, 1, 2],
        'time_bins': [[-1., -0.5, 0.5, 1.], [0., 0.5, 1.5, 2.5], [1.5, 2.5, 3., 4.5]]
    })
    

    If I understood you right, you need to create column with min value from time_bins that is greater than value in values.

    One way to do it:

    df.explode("time_bins").group_by("values").agg(
        pl.col("time_bins"),
        pl.col("time_bins").filter(
            pl.col("time_bins") > pl.col("values")
        ).min().alias("value_time_bin")
    )
    
    shape: (3, 3)
    ┌────────┬────────────────────────┬────────────────┐
    │ values ┆ time_bins              ┆ value_time_bin │
    │ ---    ┆ ---                    ┆ ---            │
    │ i64    ┆ list[f64]              ┆ f64            │
    ╞════════╪════════════════════════╪════════════════╡
    │ 1      ┆ [0.0, 0.5, 1.5, 2.5]   ┆ 1.5            │
    │ 0      ┆ [-1.0, -0.5, 0.5, 1.0] ┆ 0.5            │
    │ 2      ┆ [1.5, 2.5, 3.0, 4.5]   ┆ 2.5            │
    └────────┴────────────────────────┴────────────────┘