Search code examples
pythonpython-polars

Is there a better way to achieve this kind of row-wise operation?


In my polars dataframes, I want to add a column that will indicate with a boolean whether the rest of the columns in the dataframe contained any "null" values.

I tried various approaches that I found online or suggested by AI, most using with_columns and pl.all() to operate on all the columns but each failed for different reasons. Too many to list here.

Can someone suggest a solution for this?

My work around is to transpose the current dataframe, then use .select(pl.col("*").is_null().any()), and then transpose again and join with the original dataframe. But that seems inefficient especially considering that the polars documentation mentions that transposing is an expensive operation.


Solution

  • Using with_columns will append the following columns

    pl.all().is_null().any().suffix("_has_nulls") indicates per columns, if any row has a null value in that column. The suffix will append the columns rather than replacing the source column.

    pl.any_horizontal(pl.all().is_null()) indicates if any columns in the current row have a null value.

    pl.any_horizontal(pl.all().is_null().any()).alias("df_has_nulls") will indicate if any value in the dataframe is null.

    import polars as pl
    
    df = pl.DataFrame({
        "a": [1,2,3,4],
        "b": ['1',"2","3","4"],
        "c": [1,2,None,4],
        "d": ['1',"2","3",None],
    })
    
    df = df.with_columns(
        pl.all().is_null().any().suffix("_has_nulls"),
        pl.any_horizontal(pl.all().is_null()).alias("row_has_nulls"),
        pl.any_horizontal(pl.all().is_null().any()).alias("df_has_nulls"),
    )
    
    print(df)
    

    Result:

    shape: (4, 10)
    ┌─────┬─────┬──────┬──────┬─────────────┬─────────────┬─────────────┬─────────────┬───────────────┬──────────────┐
    │ a   ┆ b   ┆ c    ┆ d    ┆ a_has_nulls ┆ b_has_nulls ┆ c_has_nulls ┆ d_has_nulls ┆ row_has_nulls ┆ df_has_nulls │
    │ --- ┆ --- ┆ ---  ┆ ---  ┆ ---         ┆ ---         ┆ ---         ┆ ---         ┆ ---           ┆ ---          │
    │ i64 ┆ str ┆ i64  ┆ str  ┆ bool        ┆ bool        ┆ bool        ┆ bool        ┆ bool          ┆ bool         │
    ╞═════╪═════╪══════╪══════╪═════════════╪═════════════╪═════════════╪═════════════╪═══════════════╪══════════════╡
    │ 1   ┆ 1   ┆ 1    ┆ 1    ┆ false       ┆ false       ┆ true        ┆ true        ┆ false         ┆ true         │
    │ 2   ┆ 2   ┆ 2    ┆ 2    ┆ false       ┆ false       ┆ true        ┆ true        ┆ false         ┆ true         │
    │ 3   ┆ 3   ┆ null ┆ 3    ┆ false       ┆ false       ┆ true        ┆ true        ┆ true          ┆ true         │
    │ 4   ┆ 4   ┆ 4    ┆ null ┆ false       ┆ false       ┆ true        ┆ true        ┆ true          ┆ true         │
    └─────┴─────┴──────┴──────┴─────────────┴─────────────┴─────────────┴─────────────┴───────────────┴──────────────┘