Code to generate the toy dataset:
import itertools
import numpy as np
import polars as pl
first = 30
second = 50
third = 40
data = {
"a": np.concatenate(
(np.repeat(1, first), np.repeat(2, second), np.repeat(3, third))
),
"b": np.concatenate(
(
sorted(np.random.randint(1, first, size=first)),
sorted(np.random.randint(1, second, size=second)),
sorted(np.random.randint(1, third, size=third)),
)
),
}
d = [
np.tile(np.random.randint(1, first * 2, size=first), (first, 1)).tolist(),
np.tile(np.random.randint(1, second * 2, size=second), (second, 1)).tolist(),
np.tile(np.random.randint(1, third * 2, size=third), (third, 1)).tolist(),
]
data["d"] = list(itertools.chain.from_iterable(d))
df = pl.DataFrame(data)
pl_df = df.with_columns([pl.col("a").cum_count().over("a", "b").alias("c")])
pl_df.select(['a', 'b', 'c', "d"]).head()
I'm using polars "0.20.3" and I'd like to take the index value (from col B) of a list of values that are on another column, meaning that:
How can I achieve that without iterating over the rows of the dataframe?
Thanks in advance
For this, pl.Expr.list.get
can be used as follows.
(
df
.with_columns(
pl.col("d").list.get(pl.col("b")-1).alias("res")
)
)
Note that I am passing pl.col("b") - 1
to the index
parameter of pl.Expr.list.get
to mimic the 1-indexed array operation mentioned in the question (the first list element being assigned index 1 instead of index 0).
shape: (120, 4)
┌─────┬─────┬────────────────────┬─────┐
│ a ┆ b ┆ d ┆ res │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ list[i64] ┆ i64 │
╞═════╪═════╪════════════════════╪═════╡
│ 1 ┆ 1 ┆ [31, 25, 45, … 59] ┆ 31 │
│ 1 ┆ 3 ┆ [31, 25, 45, … 59] ┆ 45 │
│ 1 ┆ 4 ┆ [31, 25, 45, … 59] ┆ 25 │
│ 1 ┆ 8 ┆ [31, 25, 45, … 59] ┆ 24 │
│ 1 ┆ 8 ┆ [31, 25, 45, … 59] ┆ 24 │
│ … ┆ … ┆ … ┆ … │
└─────┴─────┴────────────────────┴─────┘