Let's say I have the following DataFrame:
df = pl.DataFrame({
'values': [[0, 1], [9, 8]],
'qc_flags': [["", "X"], ["T", ""]]
})
I only want to keep my values if the corresponding qc_flag equals ""
.
Does anyone know the correct way to go about this?
I've tried something like this:
filtered = df.with_columns(
pl.col("values").list.eval(
pl.element().filter(
pl.col("qc_flags").list.eval(
pl.element() == ""
)
)
)
)
I would expect to get 'values': [[0], [8]], but then I just end up with this error:
ComputeError: named columns are not allowed in `list.eval`; consider using `element` or `col("")`
pl.Expr.list.eval
to evaluate expression within list.pl.arg_where()
to find indexes where value is ""
.pl.Expr.list.gather()
to take sublist by indexes.df.with_columns(
filtered = pl.col.values.list.gather(
pl.col.qc_flags.list.eval(pl.arg_where(pl.element() == ""))
)
)
shape: (2, 3)
┌───────────┬───────────┬───────────┐
│ values ┆ qc_flags ┆ filtered │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ list[str] ┆ list[i64] │
╞═══════════╪═══════════╪═══════════╡
│ [0, 1] ┆ ["", "X"] ┆ [0] │
│ [9, 8] ┆ ["T", ""] ┆ [8] │
└───────────┴───────────┴───────────┘