Search code examples
pythonpython-polars

keep elements in list of lists which appear at least twice


Say I start with

import polars as pl
ser = pl.Series([[1,2,1,4], [3, 3, 3, 4], [1,2,3,4]])

How can I filter each list so it only has elements which appear at least twice?

Expected output:

shape: (3,)
Series: '' [list[i64]]
[
        [1, 1]
        [3, 3, 3]
        []
]

Is there a way to do this in polars, without using apply?


Solution

  • I think the previous solution was:

    ser.list.eval(
        pl.element().filter(pl.element().count().over(pl.element()) > 1)
    )
    

    However, .over() is no longer valid inside .eval() https://github.com/pola-rs/polars/issues/8721

    It does appear to be possible with .value_counts() but there must be a simpler way:

    ser.list.eval(
       pl.element().filter(
          pl.element().is_in(
             pl.element().value_counts(sort=True).struct[""].filter(
                pl.element().value_counts(sort=True).struct["counts"] > 1
             )
          )
       )
    )
    
    shape: (3,)
    Series: '' [list[i64]]
    [
        [1, 1]
        [3, 3, 3]
        []
    ]