Search code examples
pythonpython-3.xunicodepython-unicode

python3 encode replace unicode characters


According to the documentation, the following command

'Brückenspinne'.encode("utf-8",errors='replace')

should give me the byte sequenceb'Br??ckenspinne'. However, unicode characters are not replaced but encoded nevertheless:

b'Br\xc3\xbcckenspinne'

Can you tell me how I actually eliminate Unicode characters? (I use replace for testing purposes, I intend to use 'xmlcharrefreplace' later. To be totally honest, I want to convert the unicode characters to their xmlcharref, keeping everything as a string).


Solution

  • utf-8 encoding can represent the character ü; there is no error, so no replacement occurs.

    To see errors='replace' in action, use another encoding that cannot represent the character. For example ascii:

    >>> 'Brückenspinne'.encode("ascii", errors='replace')
    b'Br?ckenspinne'
    
    >>> 'Brückenspinne'.encode("ascii", errors='xmlcharrefreplace')
    b'Brückenspinne'