Search code examples
python-polarsrust-polars

In polars, can I create a categorical type with levels myself?


In Pandas, I can specify the levels of a Categorical type myself:

MyCat = pd.CategoricalDtype(categories=['A','B','C'], ordered=True)
my_data = pd.Series(['A','A','B'], dtype=MyCat)

This means that

  1. I can make sure that different columns and sets use the same dtype
  2. I can specify an ordering for the levels.

Is there a way to do this with Polars? I know you can use the string cache feature to achieve 1) in a different way, however I'm interested if my dtype/levels can be specified directly. I'm not aware of any way to achieve 2), however I think the categorical dtypes in Arrow do allow an optional ordering, so maybe it's possible?


Solution

  • As of Polars 0.20.0 a new pl.Enum type has been added for this purpose.

    When the categories are known up front use Enum.

    my_enum = pl.Enum(['A', 'B', 'C'])
    my_data = pl.Series(['A', 'A', 'B'], dtype=my_enum)
    
    # shape: (3,)
    # Series: '' [enum]
    # [
    #     "A"
    #     "A"
    #     "B"
    # ]
    
    my_data.to_physical()
    
    # shape: (3,)
    # Series: '' [u32]
    # [
    #     0
    #     0
    #     1
    # ]