I have a list of Unicode character codes I need to convert into chars on python 2.7.
U+0021
U+0022
U+0023
.......
U+0024
How to do that?
This regular expression will replace all U+nnnn
sequences with the corresponding Unicode character:
import re
s = u'''\
U+0021
U+0022
U+0023
.......
U+0024
'''
s = re.sub(ur'U\+([0-9A-F]{4})',lambda m: unichr(int(m.group(1),16)),s)
print(s)
Output:
!
"
#
.......
$
Explanation:
unichr
gives the character of a codepoint, e.g. unichr(0x21) == u'!'
.int('0021',16)
converts a hexadecimal string to an integer.lambda(m): expression
is an anonymous function that receives the regex match.def func(m): return expression
but inline.re.sub
matches a pattern and sends each match to a function that returns the replacement. In this case, the pattern is U+hhhh
where h is a hexadecimal digit, and the replacement function converts the hexadecimal digit string into a Unicode character.