Search code examples
pythonencodingmojibake

In python, how would one change the string 'Nelson Vel\\xc3\\xa1zquez' to 'Nelson Velazquez'?


The issue is with proper encoding. The data type is a string, where the string does not convert the spanish á properly and instead shows as \xc3\xa1. How do you convert these to show 'Nelson Velazquez'?

example variable:

text = 'Nelson Vel\\xc3\\xa1zquez'

Solution

  • If you have the string in the title, you have a double-encoding issue. Reverse the double encoding:

    >>> s = 'Nelson Vel\\xc3\\xa1zquez'
    >>> s.encode('latin1').decode('unicode-escape').encode('latin1').decode('utf8')
    'Nelson Velázquez'
    

    Ideally, fix the problem at the source. The string was either read incorrectly in the first place, or written to storage incorrectly to begin with.