The main dataframe is as follows:
test = pl.DataFrame(
{
"date": ["2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05"],
"value": [1, 2, 3, 4, 5],
}
).with_row_index().with_columns(pl.col("date").cast(pl.Date))
print(test)
out:
shape: (5, 3)
┌───────┬────────────┬───────┐
│ index ┆ date ┆ value │
│ --- ┆ --- ┆ --- │
│ u32 ┆ date ┆ i64 │
╞═══════╪════════════╪═══════╡
│ 0 ┆ 2020-01-01 ┆ 1 │
│ 1 ┆ 2020-01-02 ┆ 2 │
│ 2 ┆ 2020-01-03 ┆ 3 │
│ 3 ┆ 2020-01-04 ┆ 4 │
│ 4 ┆ 2020-01-05 ┆ 5 │
└───────┴────────────┴───────┘
I want to find the first index that is before another dataframe with a shape (1, 1):
shape: (1, 1)
┌────────────┐
│ date │
│ --- │
│ date │
╞════════════╡
│ 2020-01-04 │
└────────────┘
Expected output: 2 or "2020-01-03"
First, you can obtain the date in the dataframe other
as datetime.date
using pl.DataFrame.item()
.
other.item()
datetime.date(2020, 1, 4)
We can use this to filter the main dataframe and, subsequently, select the last row in the filtered dataframe.
(
test
.filter(
pl.col("date") < other.item()
)
.select(
pl.col("index").last()
)
.item()
)
2
Note. If you'd like the corresponding date datetime.date(2020, 1, 3)
, you can similarly select the date
column instead.