Search code examples
pythonpython-2.7python-unicode

Comparing string and unicode in Python 2.7.5


I wonder why when I make:

a = [u'k',u'ę',u'ą']

and then type:

'k' in a

I get True, while:

'ę' in a

will give me False?

It really gives me headache and it seems someone made this on purpose to make people mad...


Solution

  • And why is this?

    In Python 2.x, you can't compare unicode to string directly for non-ascii characters. This will raise a warning:

    Warning (from warnings module):
      File "__main__", line 1
    UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
    

    However, in Python 3.x this doesn't appear, as all strings are unicode objects.

    Solution?

    You can either make the string unicode:

    >>> u'ç' in a
    True
    

    Now, you're comparing both unicode objects, not unicode to string.

    Or convert both to an encoding, for example utf-8 before comparing:

    >>> c = u"ç"
    >>> u'ç'.encode('utf-8') == c.encode('utf-8')
    True
    

    Also, to use non-ascii characters in your program, you'll have to specify the encoding, at the top of the file:

    # -*- coding: utf-8 -*-
    
    #the whole program
    

    Hope this helps!