Update: This was fixed by pull/20417 in Polars 1.18.0
I'm using .map_elements
to apply a complex Python function to every element of a polars series. This is a toy example:
import polars as pl
df = pl.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
def sum_cols(row):
return row["A"] + row["B"]
df.with_columns(
pl.struct(pl.all())
.map_elements(sum_cols, return_dtype=pl.Int32).alias("summed")
)
shape: (3, 3)
┌─────┬─────┬────────┐
│ A ┆ B ┆ summed │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i32 │
╞═════╪═════╪════════╡
│ 1 ┆ 4 ┆ 5 │
│ 2 ┆ 5 ┆ 7 │
│ 3 ┆ 6 ┆ 9 │
└─────┴─────┴────────┘
However, when my function raises an exception, Polars silently uses Nulls as the output of the computation:
def sum_cols(row):
raise Exception
return row["A"] + row["B"]
df.with_columns(
pl.struct(pl.all())
.map_elements(sum_cols, return_dtype=pl.Int32).alias("summed")
)
shape: (3, 3)
┌─────┬─────┬────────┐
│ A ┆ B ┆ summed │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i32 │
╞═════╪═════╪════════╡
│ 1 ┆ 4 ┆ null │
│ 2 ┆ 5 ┆ null │
│ 3 ┆ 6 ┆ null │
└─────┴─────┴────────┘
How can I make the Polars command fail when my function raises an exception?
I'm pretty sure this is a bug in Polars.
As a workaround, you could use .map_batches()
to pass the whole "column" instead:
import polars as pl
df = pl.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
def sum_cols(col):
raise Exception
return pl.Series(row["A"] + row["B"] for row in col)
df.with_columns(
pl.struct(pl.all()).map_batches(sum_cols)
)
Which propagates exceptions as one would expect.
# ComputeError: Exception: