Search code examples
dataframepython-polars

Polars Finding the First Meeting Criteria According to Another DataFrame


The main dataframe is as follows:

test = pl.DataFrame(
    {
        "date": ["2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05"],
        "value": [1, 2, 3, 4, 5],
        }
    ).with_row_index().with_columns(pl.col("date").cast(pl.Date))
print(test)

out:
shape: (5, 3)
┌───────┬────────────┬───────┐
│ index ┆ date       ┆ value │
│ ---   ┆ ---        ┆ ---   │
│ u32   ┆ date       ┆ i64   │
╞═══════╪════════════╪═══════╡
│ 0     ┆ 2020-01-01 ┆ 1     │
│ 1     ┆ 2020-01-02 ┆ 2     │
│ 2     ┆ 2020-01-03 ┆ 3     │
│ 3     ┆ 2020-01-04 ┆ 4     │
│ 4     ┆ 2020-01-05 ┆ 5     │
└───────┴────────────┴───────┘

I want to find the first index that is before another dataframe with a shape (1, 1):

shape: (1, 1)
┌────────────┐
│ date       │
│ ---        │
│ date       │
╞════════════╡
│ 2020-01-04 │
└────────────┘

Expected output: 2 or "2020-01-03"


Solution

  • First, you can obtain the date in the dataframe other as datetime.date using pl.DataFrame.item().

    other.item()
    
    datetime.date(2020, 1, 4)
    

    We can use this to filter the main dataframe and, subsequently, select the last row in the filtered dataframe.

    (
        test
        .filter(
            pl.col("date") < other.item()
        )
        .select(
            pl.col("index").last()
        )
        .item()
    )
    
    2
    

    Note. If you'd like the corresponding date datetime.date(2020, 1, 3), you can similarly select the date column instead.