Search code examples
pythonpython-2.7unicode

Convert Unicode char code to char on Python


I have a list of Unicode character codes I need to convert into chars on python 2.7.

U+0021
U+0022
U+0023
.......
U+0024

How to do that?


Solution

  • This regular expression will replace all U+nnnn sequences with the corresponding Unicode character:

    import re
    
    s = u'''\
    U+0021
    U+0022
    U+0023
    .......
    U+0024
    '''
    
    s = re.sub(ur'U\+([0-9A-F]{4})',lambda m: unichr(int(m.group(1),16)),s)
    
    print(s)
    

    Output:

    !
    "
    #
    .......
    $
    

    Explanation:

    • unichr gives the character of a codepoint, e.g. unichr(0x21) == u'!'.
    • int('0021',16) converts a hexadecimal string to an integer.
    • lambda(m): expression is an anonymous function that receives the regex match.
      It defines a function equivalent to def func(m): return expression but inline.
    • re.sub matches a pattern and sends each match to a function that returns the replacement. In this case, the pattern is U+hhhh where h is a hexadecimal digit, and the replacement function converts the hexadecimal digit string into a Unicode character.