Say I start with
import polars as pl
ser = pl.Series([[1,2,1,4], [3, 3, 3, 4], [1,2,3,4]])
How can I filter each list so it only has elements which appear at least twice?
Expected output:
shape: (3,)
Series: '' [list[i64]]
[
[1, 1]
[3, 3, 3]
[]
]
Is there a way to do this in polars, without using apply
?
I think the previous solution was:
ser.list.eval(
pl.element().filter(pl.element().count().over(pl.element()) > 1)
)
However, .over()
is no longer valid inside .eval()
https://github.com/pola-rs/polars/issues/8721
It does appear to be possible with .value_counts()
but there must be a simpler way:
ser.list.eval(
pl.element().filter(
pl.element().is_in(
pl.element().value_counts(sort=True).struct[""].filter(
pl.element().value_counts(sort=True).struct["counts"] > 1
)
)
)
)
shape: (3,)
Series: '' [list[i64]]
[
[1, 1]
[3, 3, 3]
[]
]