For example, I can print Unicode symbol like:
print u'\u00E0'
Or
a = u'\u00E0'
print a
But it looks like I can't do something like this:
a = '\u00E0'
print someFunctionToDisplayTheCharacterRepresentedByThisCodePoint(a)
The main use case will be in loops. I have a list of unicode code points and I wish to display them on console. Something like:
with open("someFileWithAListOfUnicodeCodePoints") as uniCodeFile:
for codePoint in uniCodeFile:
print codePoint #I want the console to display the unicode character here
The file has a list of unicode code points. For example:
2109
OOBO
00E4
1F1E6
The loop should output:
℉
°
ä
🇦
Any help will be appreciated!
This is probably not a great way, but it's a start:
>>> x = '00e4'
>>> print unicode(struct.pack("!I", int(x, 16)), 'utf_32_be')
ä
First, we get the integer represented by the hexadecimal string x
. We pack that into a byte string, which we can then decode using the utf_32_be
encoding.
Since you are doing this a lot, you can precompile the struct:
int2bytes = struct.Struct("!I").pack
with open("someFileWithAListOfUnicodeCodePoints") as fh:
for code_point in fh:
print unicode(int2bytes(int(code_point, 16)), 'utf_32_be')
If you think it's clearer, you can also use the decode
method instead of the unicode
type directly:
>>> print int2bytes(int('00e4', 16)).decode('utf_32_be')
ä
Python 3 added a to_bytes
method to the int
class that lets you bypass the struct
module:
>>> str(int('00e4', 16).to_bytes(4, 'big'), 'utf_32_be')
"ä"