Search code examples
pythonunicodepython-unicode

How to print Unicode like “u{variable}” in Python 2.7?


For example, I can print Unicode symbol like:

print u'\u00E0'

Or

a = u'\u00E0'
print a

But it looks like I can't do something like this:

a = '\u00E0'
print someFunctionToDisplayTheCharacterRepresentedByThisCodePoint(a)

The main use case will be in loops. I have a list of unicode code points and I wish to display them on console. Something like:

with open("someFileWithAListOfUnicodeCodePoints") as uniCodeFile:
    for codePoint in uniCodeFile:
        print codePoint #I want the console to display the unicode character here

The file has a list of unicode code points. For example:

2109
OOBO
00E4
1F1E6

The loop should output:

℉
°
ä
🇦  

Any help will be appreciated!


Solution

  • This is probably not a great way, but it's a start:

    >>> x = '00e4'
    >>> print unicode(struct.pack("!I", int(x, 16)), 'utf_32_be')
    ä
    

    First, we get the integer represented by the hexadecimal string x. We pack that into a byte string, which we can then decode using the utf_32_be encoding.

    Since you are doing this a lot, you can precompile the struct:

    int2bytes = struct.Struct("!I").pack
    with open("someFileWithAListOfUnicodeCodePoints") as fh:
        for code_point in fh:
            print unicode(int2bytes(int(code_point, 16)), 'utf_32_be')
    

    If you think it's clearer, you can also use the decode method instead of the unicode type directly:

    >>> print int2bytes(int('00e4', 16)).decode('utf_32_be')
    ä
    

    Python 3 added a to_bytes method to the int class that lets you bypass the struct module:

    >>> str(int('00e4', 16).to_bytes(4, 'big'), 'utf_32_be')
    "ä"