Search code examples
pythonpython-polars

Create Categorical series from physical values


I want to create a categorical column, where each category has a descriptive name for self-documentation. I have a list of integers equivalent to the physical values in the categorical column, and I want to make the categorical column without creating an intermediate list of strings to pass to pl.Series.

import polars as pl

dt = pl.Enum(["0", "1", "2"])
s1 = pl.Series(["0", "0", "2", "1"], dtype=dt)
physical = list(s1.to_physical())
print(f"{physical=}")
s2 = pl.Series([str(p) for p in physical], dtype=dt)
assert s1.equals(s2)

# turning physical to strings just to create the series which is stored as ints is a waste of compute power
# how to construct a series from the physical values?
s2 = pl.Series.from_physical(physical, dtype=dt)
assert s1.equals(s3)

This prints

physical=[0, 0, 2, 1]

Then it errors because Series.to_physical doesn't exist. Is there a function like from_physical that would make this snippet run to completion without erroring on the final assertion?


Solution

  • You can simply cast to the enum datatype.

    assert s1.equals(s1.to_physical().cast(dt)) # True