I wanted to loop over Unicode-Characters in Python like this:
hex_list = "012346789abcdef"
for _1 in hex_list:
for _2 in hex_list:
for _3 in hex_list:
for _4 in hex_list:
my_char = r"\u" + _1 + _2 + _3 + _4
print(my_char)
As expected this printed out:
\u0000
\u0001
...
\uffff
Then I tried to change the code above to print not the Unicode but the corresponding Characters:
hex_list = "012346789abcdef"
for _1 in hex_list:
for _2 in hex_list:
for _3 in hex_list:
for _4 in hex_list:
my_char = r"\u" + _1 + _2 + _3 + _4
eval("print(my_char)")
But this outputs the same as the code before.
hex_list = "012346789abcdef"
for _1 in hex_list:
for _2 in hex_list:
for _3 in hex_list:
for _4 in hex_list:
eval("print(" + r"\u" + f"{_1}{_2}{_3}{_4})")
And something like this raises following errow message:
eval("print(" + r"\u" + f"{_1}{_2}{_3}{_4})")
File "<string>", line 1
print(\u0000)
^
SyntaxError: unexpected character after line continuation character
What would make this code work as intended?
Python strings are Unicode already. Unicode isn't some kind of escape sequence, it's a way of mapping characters to bytes.
Given that fact, you can use chr to convert a Unicode code point to a string with that character, eg print(chr(1081))
. As the function's docs say:
Return the string representing a character whose Unicode code point is the integer i. For example,
chr(97)
returns the string'a'
, whilechr(8364)
returns the string'€'
. This is the inverse oford()
.The valid range for the argument is from 0 through 1,114,111
A simple loop can generate all valid characters. Actually printing them is another matter:
for i in range(0, 1114112 ):
print(chr(i))
Running this on my machine eventually fails with
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed
That value couldn't be converted in a form that can be printed on my terminal, which uses UTF8