Search code examples
pythonpython-polars

polars full join default null value


https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.join.html

Can I specify the default NULL value for full joins? Like 0?


Solution

  • The join method does not currently have an option for setting a default value for nulls. However, there is an easy way to accomplish this.

    Let's say we have this data:

    import polars as pl
    
    df1 = pl.DataFrame({"key": ["a", "b", "d"], "var1": [1, 1, 1]})
    
    df2 = pl.DataFrame({"key": ["a", "b", "c"], "var2": [2, 2, 2]})
    
    df1.join(df2, on="key", how="full")
    
    shape: (4, 4)
    ┌──────┬──────┬───────────┬──────┐
    │ key  ┆ var1 ┆ key_right ┆ var2 │
    │ ---  ┆ ---  ┆ ---       ┆ ---  │
    │ str  ┆ i64  ┆ str       ┆ i64  │
    ╞══════╪══════╪═══════════╪══════╡
    │ a    ┆ 1    ┆ a         ┆ 2    │
    │ b    ┆ 1    ┆ b         ┆ 2    │
    │ null ┆ null ┆ c         ┆ 2    │
    │ d    ┆ 1    ┆ null      ┆ null │
    └──────┴──────┴───────────┴──────┘
    

    To create a different value for the null values, simply use this:

    df1.join(df2, on="key", how="full").with_columns(pl.exclude("key", "key_right").fill_null(0))
    
    shape: (4, 4)
    ┌──────┬──────┬───────────┬──────┐
    │ key  ┆ var1 ┆ key_right ┆ var2 │
    │ ---  ┆ ---  ┆ ---       ┆ ---  │
    │ str  ┆ i64  ┆ str       ┆ i64  │
    ╞══════╪══════╪═══════════╪══════╡
    │ a    ┆ 1    ┆ a         ┆ 2    │
    │ b    ┆ 1    ┆ b         ┆ 2    │
    │ null ┆ 0    ┆ c         ┆ 2    │
    │ d    ┆ 1    ┆ null      ┆ 0    │
    └──────┴──────┴───────────┴──────┘