https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.join.html
Can I specify the default NULL value for full joins? Like 0?
The join
method does not currently have an option for setting a default value for nulls. However, there is an easy way to accomplish this.
Let's say we have this data:
import polars as pl
df1 = pl.DataFrame({"key": ["a", "b", "d"], "var1": [1, 1, 1]})
df2 = pl.DataFrame({"key": ["a", "b", "c"], "var2": [2, 2, 2]})
df1.join(df2, on="key", how="full")
shape: (4, 4)
┌──────┬──────┬───────────┬──────┐
│ key ┆ var1 ┆ key_right ┆ var2 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 │
╞══════╪══════╪═══════════╪══════╡
│ a ┆ 1 ┆ a ┆ 2 │
│ b ┆ 1 ┆ b ┆ 2 │
│ null ┆ null ┆ c ┆ 2 │
│ d ┆ 1 ┆ null ┆ null │
└──────┴──────┴───────────┴──────┘
To create a different value for the null
values, simply use this:
df1.join(df2, on="key", how="full").with_columns(pl.exclude("key", "key_right").fill_null(0))
shape: (4, 4)
┌──────┬──────┬───────────┬──────┐
│ key ┆ var1 ┆ key_right ┆ var2 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 │
╞══════╪══════╪═══════════╪══════╡
│ a ┆ 1 ┆ a ┆ 2 │
│ b ┆ 1 ┆ b ┆ 2 │
│ null ┆ 0 ┆ c ┆ 2 │
│ d ┆ 1 ┆ null ┆ 0 │
└──────┴──────┴───────────┴──────┘