Search code examples
python-polars

How to get index corresponding to quantile in Polars List?


Suppose I have the following dataframe

df = pl.DataFrame({'x':[[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]]})

To get the nth percentile, I can do the following:

list_quantile_30 = pl.element().quantile(0.3)
df.select(pl.col('x').list.eval(list_quantile_30))

But I can't figure out how to get the index corresponding to the percentile? Here is how I would do it using numpy:

import numpy as np
series = [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
np.searchsorted(series, np.percentile(series, 30))

Is there a way to do this in a Polars way without using map_elements?


Solution

  • Continuing from your example you could use pl.arg_where to search for a condition.

    df = pl.DataFrame({'x':[[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]]})
    
    list_quantile_30 = pl.element().quantile(0.3)
    
    df.with_columns(pl.col('x').list.eval(
        pl.arg_where(list_quantile_30 <= pl.element()).first()
    ).flatten().alias("arg_where"))
    
    shape: (1, 2)
    ┌────────────────┬───────────┐
    │ x              ┆ arg_where │
    │ ---            ┆ ---       │
    │ list[i64]      ┆ u32       │
    ╞════════════════╪═══════════╡
    │ [0, 2, ... 20] ┆ 3         │
    └────────────────┴───────────┘
    

    This convinces me to add a pl.search_sorted in polars as well.