Find the next value >= the actual value plus 50% using polars

I have the following dataframe:

 df = pl.DataFrame({
        "Column A": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        "Column B": [2, 3, 1, 4, 1, 7, 3, 2, 12, 0]
    })

I want to create a new column C that holds the distance, in rows, between the B value of the current row and the next value in column B that is greater than or equal to B + 50%.

The end result should look like this:

 df = pl.DataFrame({
        "Column A": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        "Column B": [2, 3, 1, 4, 1, 7, 3, 2, 12, 0],
        "Column C": [1, 4, 1, 2, 1, 3, 2, 1, None, None]
    })

How can I efficiently achieve this using Polars, especially since I'm working with a large DataFrame?

Solution

Ok, so first I should say - this one looks like it requires join on inequality on multiple columns and from what I've found pure polars is not great with it. It's probably possible to do it with join_asof but I couldn't make it pretty.

I'd probably use duckdb integration with polars to achieve the results:

import duckdb

duckdb.sql("""
    select
        d."Column A",
        d."Column B",
        (
            select tt."Column A"
            from df as tt
            where tt."Column A" > d."Column A" and tt."Column B" >= d."Column B" * 1.5
            order by tt."Column A" asc
            limit 1
        ) - d."Column A" as "Column C"
    from df as d
""").pl()

┌──────────┬──────────┬──────────┐
│ Column A ┆ Column B ┆ Column C │
│ ---      ┆ ---      ┆ ---      │
│ i64      ┆ i64      ┆ i64      │
╞══════════╪══════════╪══════════╡
│ 1        ┆ 2        ┆ 1        │
│ 2        ┆ 3        ┆ 4        │
│ 3        ┆ 1        ┆ 1        │
│ 4        ┆ 4        ┆ 2        │
│ 5        ┆ 1        ┆ 1        │
│ 6        ┆ 7        ┆ 3        │
│ 7        ┆ 3        ┆ 2        │
│ 8        ┆ 2        ┆ 1        │
│ 9        ┆ 12       ┆ null     │
│ 10       ┆ 0        ┆ null     │
└──────────┴──────────┴──────────┘