Search code examples
pythonpython-3.xpython-unicode

constructing a Unicode string


I'm trying to construct and print a Unicode string with Python 3.x. So, for example, the following works fine:

a = '\u0394'
print(a)
Δ

But if I try to construct this by appending two strings, I have several problems:

a = '\u'
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape

a = '\\u'
b = '0394'
c = a + b
print(c)
\u4308

What am I missing here?


Solution

  • \uhhhh is an escape sequence, a notation used in string literals. You can't construct that notation from parts, at least not directly like that.

    Generally, you'd use the chr() function to produce individual characters from an integer instead:

    >>> chr(int('0394', 16))
    'Δ'
    

    for example, where I first interpreted the hex string 0394 as an integer in base 16.

    If you must generate the Python string literal escape notation, use codecs.decode() with the unicode_escape codec:

    >>> import codecs
    >>> r'\u' + '0394'
    '\\u0394'
    >>> codecs.decode(r'\u' + '0394', 'unicode_escape')
    'Δ'