I am trying to compare two dataframes via dfcompare = (df0 == df1)
and nulls are never considered identical (unlike join
there is no option to allow nulls to match).
My approach with other fields is to fill them in with an "empty value" appropriate to their datatype. What should I use for structs?
import polars as pl
df = pl.DataFrame(
{
"int": [1, 2, None],
"data" : [dict(a=1,b="b"),dict(a=11,b="bb"),None]
}
)
df.describe()
print(df)
df2 = df.with_columns(pl.col("int").fill_null(0))
df2.describe()
print(df2)
# these error out:...
try:
df3 = df2.with_columns(pl.col("data").fill_null(dict(a=0,b="")))
except (Exception,) as e:
print("try#1", e)
try:
df3 = df2.with_columns(pl.col("data").fill_null(pl.struct(dict(a=0,b=""))))
except (Exception,) as e:
print("try#2", e)
Output:
shape: (3, 2)
┌──────┬─────────────┐
│ int ┆ data │
│ --- ┆ --- │
│ i64 ┆ struct[2] │
╞══════╪═════════════╡
│ 1 ┆ {1,"b"} │
│ 2 ┆ {11,"bb"} │
│ null ┆ {null,null} │
└──────┴─────────────┘
shape: (3, 2)
┌─────┬─────────────┐
│ int ┆ data │
│ --- ┆ --- │
│ i64 ┆ struct[2] │
╞═════╪═════════════╡
│ 1 ┆ {1,"b"} │
│ 2 ┆ {11,"bb"} │
│ 0 ┆ {null,null} │
└─────┴─────────────┘
try#1 invalid literal value: "{'a': 0, 'b': ''}"
try#2 a
Error originated just after this operation:
DF ["int", "data"]; PROJECT */2 COLUMNS; SELECTION: "None"
My, satisfactory, workaround has been to unnest
the columns instead. This works fine (even better as it allow subfield-by-subfield fills). Still, I remain curious about how to achieve a suitable "struct literal" that can be passed into these types of functions.
One can also imagine wanting to add a hardcoded column as in df4 = df.with_columns(pl.lit("0").alias("zerocol"))
A struct literal to use in the context of pl.Expr.fill_null
can be created with pl.struct
as follows.
df.with_columns(
pl.col("data").fill_null(
pl.struct(a=pl.lit(1), b=pl.lit("MISSING"))
)
)
shape: (3, 2)
┌──────┬───────────────┐
│ int ┆ data │
│ --- ┆ --- │
│ i64 ┆ struct[2] │
╞══════╪═══════════════╡
│ 1 ┆ {1,"b"} │
│ 2 ┆ {11,"bb"} │
│ null ┆ {1,"MISSING"} │
└──────┴───────────────┘