Given a Polars DataFrame, is there a way to replace a particular value by "null"? For example, if there's a sentinel value like "_UNKNOWN"
and I want to make it truly missing in the dataframe instead.
Update:
Expr.replace()
has also since been added to Polars.
df.with_columns(pl.col(pl.String).replace("_UNKNOWN", None))
shape: (4, 3)
┌──────┬──────┬─────┐
│ A ┆ B ┆ C │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞══════╪══════╪═════╡
│ a ┆ null ┆ 1 │
│ b ┆ d ┆ 2 │
│ null ┆ e ┆ 3 │
│ c ┆ f ┆ 4 │
└──────┴──────┴─────┘
You can use .when().then().otherwise()
pl.col(pl.String)
is used to select all "string columns".
df = pl.DataFrame({
"A": ["a", "b", "_UNKNOWN", "c"],
"B": ["_UNKNOWN", "d", "e", "f"],
"C": [1, 2, 3, 4]
})
df.with_columns(
pl.when(pl.col(pl.String) == "_UNKNOWN")
.then(None)
.otherwise(pl.col(pl.String)) # keep original value
.name.keep()
)