Search code examples
pythonunicodestring-literalsunicode-literals

Can't make unicode string literal with xor symbol in Python?


I'm trying to print the xor symbol in Python (𐌈).

I can print a universal quantifier just fine:

>>> print u"\u2200"
∀

But when I do xor, it prints 8 instead:

>>> print u"\u10308"
8

Why?


Solution

  • When you specify a unicode with u'\uXXXX', the XXXX must be exactly 4 hex digits. To specify a unicode with 8 hexdigits, you must use a capital U: u'\UXXXXXXXX'.

    So u'\u10308' is actually two characters, u'\u1030' followed by u'8'.

    u'\u1030' is the MYANMAR VOWEL SIGN UU character, which is a non-spacing mark. This character is not visible along the baseline in and of itself. So all you end up seeing is the 8.


    The symbol you posted is the OLD ITALIC LETTER THE unicode character.

    In [103]: print(u'\N{OLD ITALIC LETTER THE}')
    𐌈
    
    In [104]: print(u'\U00010308')
    𐌈
    

    The XOR unicode character is:

    In [105]: print(u'\N{XOR}')
    ⊻
    
    In [106]: print(u'\u22bb')
    ⊻
    

    Other characters you might find useful:

    In [110]: print(u'\N{CIRCLED PLUS}')
    ⊕
    
    In [111]: print(u'\N{CIRCLED TIMES}')
    ⊗
    
    In [112]: print(u'\N{N-ARY CIRCLED PLUS OPERATOR}')
    ⨁
    
    In [113]: print(u'\N{N-ARY CIRCLED TIMES OPERATOR}')
    ⨂
    

    PS. You can find the Unicode name of (some) unicode characters this way:

    In [95]: import unicodedata as UD
    
    In [96]: UD.name('𐌈'.decode('utf-8'))
    Out[96]: 'OLD ITALIC LETTER THE'