Search code examples
pythonpython-polars

Find the next value >= the actual value plus 50% using polars


I have the following dataframe:

 df = pl.DataFrame({
        "Column A": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        "Column B": [2, 3, 1, 4, 1, 7, 3, 2, 12, 0]
    })

I want to create a new column C that holds the distance, in rows, between the B value of the current row and the next value in column B that is greater than or equal to B + 50%.

The end result should look like this:

 df = pl.DataFrame({
        "Column A": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        "Column B": [2, 3, 1, 4, 1, 7, 3, 2, 12, 0],
        "Column C": [1, 4, 1, 2, 1, 3, 2, 1, None, None]
    })

How can I efficiently achieve this using Polars, especially since I'm working with a large DataFrame?


Solution

  • Ok, so first I should say - this one looks like it requires join on inequality on multiple columns and from what I've found pure polars is not great with it. It's probably possible to do it with join_asof but I couldn't make it pretty.

    I'd probably use duckdb integration with polars to achieve the results:

    import duckdb
    
    duckdb.sql("""
        select
            d."Column A",
            d."Column B",
            (
                select tt."Column A"
                from df as tt
                where tt."Column A" > d."Column A" and tt."Column B" >= d."Column B" * 1.5
                order by tt."Column A" asc
                limit 1
            ) - d."Column A" as "Column C"
        from df as d
    """).pl()
    
    ┌──────────┬──────────┬──────────┐
    │ Column A ┆ Column B ┆ Column C │
    │ ---      ┆ ---      ┆ ---      │
    │ i64      ┆ i64      ┆ i64      │
    ╞══════════╪══════════╪══════════╡
    │ 1        ┆ 2        ┆ 1        │
    │ 2        ┆ 3        ┆ 4        │
    │ 3        ┆ 1        ┆ 1        │
    │ 4        ┆ 4        ┆ 2        │
    │ 5        ┆ 1        ┆ 1        │
    │ 6        ┆ 7        ┆ 3        │
    │ 7        ┆ 3        ┆ 2        │
    │ 8        ┆ 2        ┆ 1        │
    │ 9        ┆ 12       ┆ null     │
    │ 10       ┆ 0        ┆ null     │
    └──────────┴──────────┴──────────┘