Update: numpy.random.choice
is no longer parsed as an Object
type. The example produces a String
column as expected without any casting needed.
I got a pl.LazyFrame with a column of type Object that contains date representations, it also includes missing values (None).
In a first step I would like to convert the column from Object
to String
however this results in a ComputeError. I can not seem to figure out why. I suppose this is due to the None values, sadly I can not drop those at the current point in time.
import numpy as np
import polars as pl
rng = np.random.default_rng(12345)
df = pl.LazyFrame(
data={
"date": rng.choice(
[None, "03.04.1998", "03.05.1834", "05.06.2025"], 100
),
}
)
df.with_columns(pl.col("date").cast(pl.String)).collect()
When Polars assigns the pl.Object
type it essentially means: "I do not understand what this is."
By the time you end up with this type, it is generally too late to do anything useful with it.
In this particular case, numpy.random.choice
is creating a numpy array of dtype=object
>>> rng.choice([None, "foo"], 3)
array([None, None, 'foo'], dtype=object)
Polars has native .sample()
functionality which you could use to create your data instead.
df = pl.select(date =
pl.Series([None, "03.04.1998", "03.05.1834", "05.06.2025"])
.sample(100, with_replacement=True)
)
# shape: (100, 1)
# ┌────────────┐
# │ date │
# │ --- │
# │ str │
# ╞════════════╡
# │ null │
# │ 05.06.2025 │
# │ 03.05.1834 │
# │ 03.04.1998 │
# │ … │
# │ null │
# │ 03.04.1998 │
# │ 03.05.1834 │
# │ 03.04.1998 │
# └────────────┘