In Pandas, I can specify the levels of a Categorical type myself:
MyCat = pd.CategoricalDtype(categories=['A','B','C'], ordered=True)
my_data = pd.Series(['A','A','B'], dtype=MyCat)
This means that
Is there a way to do this with Polars? I know you can use the string cache feature to achieve 1) in a different way, however I'm interested if my dtype/levels can be specified directly. I'm not aware of any way to achieve 2), however I think the categorical dtypes in Arrow do allow an optional ordering, so maybe it's possible?
As of Polars 0.20.0
a new pl.Enum
type has been added for this purpose.
When the categories are known up front use
Enum
.
my_enum = pl.Enum(['A', 'B', 'C'])
my_data = pl.Series(['A', 'A', 'B'], dtype=my_enum)
# shape: (3,)
# Series: '' [enum]
# [
# "A"
# "A"
# "B"
# ]
my_data.to_physical()
# shape: (3,)
# Series: '' [u32]
# [
# 0
# 0
# 1
# ]