Search code examples
pythonpandascategorical-data

Comparing two elements of a categorical variable according to their ordered categorization


In Python, I create a categorical variable like this:

x = pd.Categorical(["Hi", "Lo", "Med", "Zer", "Lo", "Zer", "Lo", "Hi"], categories = ["Zer", "Lo", "Med", "Hi"], ordered=True)

I want to compare element 0 with element 1. In principle, "Hi" is greater than "Lo". Why do I get False when I type x[0] > x[1]?

How do I compare two elements of a categorical variable according to their ordered categorization?


Solution

  • Once you slice a single item, you're back to having a python string and lose all information on order:

    type(x[0])
    # str
    

    To have a working comparison you need to remain as array:

    x[[0]]>x[[1]]
    # array([ True])
    

    When using pandas/numpy you generally want to perform vectorial operations, i.e handle multiple items/comparisons at once.