Search code examples
pythonpython-polars

Replace value by null in Polars


Given a Polars DataFrame, is there a way to replace a particular value by "null"? For example, if there's a sentinel value like "_UNKNOWN" and I want to make it truly missing in the dataframe instead.


Solution

  • Update: Expr.replace() has also since been added to Polars.

    df.with_columns(pl.col(pl.String).replace("_UNKNOWN", None))
    
    shape: (4, 3)
    ┌──────┬──────┬─────┐
    │ A    ┆ B    ┆ C   │
    │ ---  ┆ ---  ┆ --- │
    │ str  ┆ str  ┆ i64 │
    ╞══════╪══════╪═════╡
    │ a    ┆ null ┆ 1   │
    │ b    ┆ d    ┆ 2   │
    │ null ┆ e    ┆ 3   │
    │ c    ┆ f    ┆ 4   │
    └──────┴──────┴─────┘
    

    You can use .when().then().otherwise()

    pl.col(pl.String) is used to select all "string columns".

    df = pl.DataFrame({
       "A": ["a", "b", "_UNKNOWN", "c"], 
       "B": ["_UNKNOWN", "d", "e", "f"], 
       "C": [1, 2, 3, 4]
    })
    
    df.with_columns(
       pl.when(pl.col(pl.String) == "_UNKNOWN")
         .then(None)
         .otherwise(pl.col(pl.String)) # keep original value
         .name.keep()
    )