Suppose I have the following dataframe
df = pl.DataFrame({'x':[[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]]})
To get the nth percentile, I can do the following:
list_quantile_30 = pl.element().quantile(0.3)
df.select(pl.col('x').list.eval(list_quantile_30))
But I can't figure out how to get the index corresponding to the percentile? Here is how I would do it using numpy:
import numpy as np
series = [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
np.searchsorted(series, np.percentile(series, 30))
Is there a way to do this in a Polars way without using map_elements?
Continuing from your example you could use pl.arg_where
to search for a condition.
df = pl.DataFrame({'x':[[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]]})
list_quantile_30 = pl.element().quantile(0.3)
df.with_columns(pl.col('x').list.eval(
pl.arg_where(list_quantile_30 <= pl.element()).first()
).flatten().alias("arg_where"))
shape: (1, 2)
┌────────────────┬───────────┐
│ x ┆ arg_where │
│ --- ┆ --- │
│ list[i64] ┆ u32 │
╞════════════════╪═══════════╡
│ [0, 2, ... 20] ┆ 3 │
└────────────────┴───────────┘
This convinces me to add a pl.search_sorted
in polars as well.