Search code examples
pythondataframepython-polars

Keep only rows that have at least one null


I am trying to do basically the opposite of drop_nulls(). I want to keep all rows that have at least one null.

I want to do something like (but I don't want to list all other columns):

for (name,) in (
    df.filter(
        pl.col("a").is_null()
        | pl.col("b").is_null()
        | pl.col("c").is_null()
    )
    .select("name")
    .unique()
    .rows()
):
    print(
        f"Ignoring `{name}` because it has at least one null",
        file=sys.stderr,
    )
df = df.drop_nulls()

Solution

  • It sounds like you are looking for pl.Expr.any_horizontal. The following will keep all rows containing at least one null value (in any of the columns).

    df.filter(pl.any_horizontal(pl.all().is_null()))