Search code examples
pythonhexdiacriticsspanish

Python - code transformation from hexadecimal \XE1 to á (a with spanish accent)


#When I try to transform in my code the letters from hexadecimal code to Spanish accent didn´t work.

#My IDLE is Spider

Table

[Transform to Spanish letters] (https://i.sstatic.net/rGcZQ.png)

#Example: #Definition three examples of a txt or str with #Spanish chr:

line3='RODR\xCDGUEZ' #->str
line4='Vel\xE1squez' #->str
line5='Andr\xE9s C\xe1ceres' #->str

--------------------------------

#Please could solve the problem, because the string is the same but the idle work in two different ways, aren't similar results.

Case 1 Correct

line2='RODR\xCDGUEZ'

#Result in the Variable explorer 'RODRÍGUEZ'

Case 2 Incorrect

#(with modify de str internal by code)

line='RODR=CDGUEZ' 

#with "=" to will replace in the next command

line=line.replace('=','\\x')

#The result is without Í??

'RODR\xCDGUEZ'

#I don't understand why the idle now don't recognize "Í" and maintain "\xCD"

#th


Solution

  • You cannot generate an escape code by simply building a string that looks like one. Additional steps are needed to convert the string to a bytes object and then the manually constructed escape codes can be decoded:

    >>> s = 'RODR=CDGUEZ'
    >>> s.replace('=','\\x').encode('ascii').decode('unicode-escape')
    'RODRÍGUEZ'
    

    I think what you really have is an (incomplete) email header, as that uses the =hh notation for encoding non-ASCII data bytes. The missing information is what encoding the bytes represent. Below shows a valid encoded email header:

    >>> import email.header
    >>> for value, encoding in email.header.decode_header('=?iso-8859-1?q?RODR=CDGUEZ?='):
    ...    print(value.decode(encoding))
    ...
    RODRÍGUEZ