Search code examples
pythondataframedata-sciencepython-polars

What is the recommended way for retrieving row numbers (index) for polars?


I know polars does not support index by design, so df.filter(expr).index isn't an option, another way I can think of is by adding a new column before applying any filters, not sure if this is an optimal way for doing so in polars

df.with_columns(pl.Series('index', range(len(df))).filter(expr).index

Solution

  • Use with_row_index():

    df = pl.DataFrame([pl.Series("a", [5, 9, 6]), pl.Series("b", [8, 3, 4])])
    
    In [20]: df.with_row_index()
    Out[20]: 
    shape: (3, 3)
    ┌────────┬─────┬─────┐
    │ index  ┆ a   ┆ b   │
    │ ---    ┆ --- ┆ --- │
    │ u32    ┆ i64 ┆ i64 │
    ╞════════╪═════╪═════╡
    │ 0      ┆ 5   ┆ 8   │
    │ 1      ┆ 9   ┆ 3   │
    │ 2      ┆ 6   ┆ 4   │
    └────────┴─────┴─────┘
    
    # Start from 1 instead of 0.
    In [21]: df.with_row_index(offset=1)
    Out[21]: 
    shape: (3, 3)
    ┌────────┬─────┬─────┐
    │ index  ┆ a   ┆ b   │
    │ ---    ┆ --- ┆ --- │
    │ u32    ┆ i64 ┆ i64 │
    ╞════════╪═════╪═════╡
    │ 1      ┆ 5   ┆ 8   │
    │ 2      ┆ 9   ┆ 3   │
    │ 3      ┆ 6   ┆ 4   │
    └────────┴─────┴─────┘
    
    # Start from 1 and call column "my_index".
    In [22]: df.with_row_index(name="my_index", offset=1)
    Out[22]: 
    shape: (3, 3)
    ┌──────────┬─────┬─────┐
    │ my_index ┆ a   ┆ b   │
    │ ---      ┆ --- ┆ --- │
    │ u32      ┆ i64 ┆ i64 │
    ╞══════════╪═════╪═════╡
    │ 1        ┆ 5   ┆ 8   │
    │ 2        ┆ 9   ┆ 3   │
    │ 3        ┆ 6   ┆ 4   │
    └──────────┴─────┴─────┘