Search code examples
pythonpython-polars

Filter a LazyFrame by row index


Is there an idiomatic way to get specific rows from a LazyFrame? There's two methods I could figure out. Not sure which is better, or if there's some different method I should use.

import polars as pl

df = pl.DataFrame({"x": ["a", "b", "c", "d"]}).lazy()

rows = [1, 3]

# method 1
(
  df.with_row_index("row_number")
  .filter(pl.col("row_number").is_in(rows))
  .drop("row_number")
  .collect()
)

# method 2
df.select(pl.all().gather(rows)).collect()

Solution

  • pl.Expr.gather is the idiomatic way to take values by index.

    df.select(pl.all().gather(rows)).collect()
    

    For completeness, method 1 can be refined by using an expression for the index. This way no temporary column is created and dropped again.

    # method 1.1
    df.filter(pl.int_range(pl.len()).is_in(rows)).collect()