Search code examples
pythonpython-polars

Filling `null` values of a column with another column


I want to fill the null values of a column with the content of another column of the same row in a lazy data frame in Polars.

Is this possible with reasonable performance?


Solution

  • There's a function for this: fill_null.

    Let's say we have this data:

    import polars as pl
    
    df = pl.DataFrame({'a': [1, None, 3, 4],
                       'b': [10, 20, 30, 40]
                       }).lazy()
    print(df.collect())
    
    shape: (4, 2)
    ┌──────┬─────┐
    │ a    ┆ b   │
    │ ---  ┆ --- │
    │ i64  ┆ i64 │
    ╞══════╪═════╡
    │ 1    ┆ 10  │
    │ null ┆ 20  │
    │ 3    ┆ 30  │
    │ 4    ┆ 40  │
    └──────┴─────┘
    

    We can fill the null values in column a with values in column b:

    df.with_columns(pl.col('a').fill_null(pl.col('b'))).collect()
    
    shape: (4, 2)
    ┌─────┬─────┐
    │ a   ┆ b   │
    │ --- ┆ --- │
    │ i64 ┆ i64 │
    ╞═════╪═════╡
    │ 1   ┆ 10  │
    │ 20  ┆ 20  │
    │ 3   ┆ 30  │
    │ 4   ┆ 40  │
    └─────┴─────┘
    

    The performance of this will be quite good.