Search code examples
pandascategoriesseries

Comparison between ordered categorical type in Pandas not working as expected


The following code:

s2 = pd.Series(['m','l','s','xl','xs'])

size_type = pd.api.types.CategoricalDtype(categories =['xs','s','m','l','xl'], ordered = True)

s3 = s2.astype(size_type)

print(s3)

Yelds this result:

0     m
1     l
2     s
3    xl
4    xs
dtype: category
Categories (5, object): ['xs' < 's' < 'm' < 'l' < 'xl']

So I expect that the "m" type would be bigger than the "s" type, acoording to the order that I set when I created the category. But when I check this in a comparison, the result is the opposite:

s3[0] > s3[2]

Yelds this result:

False

Why is this happening?


Solution

  • s3[0] and s3[2] return strings, which are not ordered by category code, you can use .cat.codes to access the internally stored code for comparison:

    s3.cat.codes[0] > s3.cat.codes[2]
    # True
    

    To see .cat.codes in detail:

    s3.cat.codes
    #0    2
    #1    3
    #2    1
    #3    4
    #4    0
    #dtype: int8
    
    s3.cat.codes[0]
    #2
    
    s3.cat.codes[2]
    #1