Search code examples
pythonstringcomparison

How do string comparisons work when they contain both numbers and letters?


I'm trying to compare time in Python, and came up with some weird comparisons. I've got no idea how the following statements work:

>>> "17:30" > "16:30"
True
>>> "12:30" > "13:30"
False
>>> '18:00 - asdfj' > '16:30 - asdfj'
True

My guess is that it takes the first number from before the colon, I'm not completely sure about it.


Solution

  • As others have pointed out, a comparison between strings is a question of lexicographical ordering.

    What that means procedurally:

    • two strings are compared one character at a time
    • the first character that's different decides which string is 'greater than' the other
    • if no characters are different and the strings are the same length, they are 'equal'.
    • if two characters are different, their 'ordinal value' decides which is 'greater'
    • a character is 'greater than' no character

    For example, 'ab' > 'a' is True, because 'a' == 'a', but the first string has an extra character. And 'abc' < 'abd' because 'c' < 'd'.

    'a' < 'b' because ord('a') < ord('b'). The ordinal value of a character is typically its ASCII value for normal characters, or more precisely, its the Unicode code point (https://docs.python.org/3/library/functions.html#ord). This also means that 'A' < 'a', because uppercase letters come before lowercase letters in Unicode. And '1' < 'A' because numbers come before letters.

    Note that this may sometimes give surprising results (note the dots on the Ӓ):

    >>> 'Ӓ' > 'a'
    True
    >>> 'A' > 'a'
    False
    

    There are many online tables and overviews of Unicode, but here's a fairly plain example: https://www.tamasoft.co.jp/en/general-info/unicode.html

    As for your example:

    >>> '18:00 - asdfj' > '16:30 - asdfj'
    True
    

    This makes sense, because '8' > '6' - the rest of the string doesn't matter.