Search code examples
pythonlistescapinghexiso-8859-1

List with hex escaped values to readable string in Python


I have a list like this:

['<option value="284">\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0 Historia </option>', '<option value="393">\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0 H\xc3\xa4lsa & sk\xc3\xb6nhet </option>']

How do I convert this list into a list with elements that are actually readable?

I believe it is in ISO 8859-1.


Solution

  • Decode the string value using the .decode() method; you are looking at UTF-8 data actually:

    >>> print lst[0].decode('utf8')
    <option value="284">     Historia </option>
    >>> print lst[1].decode('utf8')
    <option value="393">     Hälsa & skönhet </option>
    

    The first bytes represent Unicode code point U+00a0, a non-breaking space (&nbsp; as HTML entity):

    >>> lst[0].decode('utf8')
    u'<option value="284">\xa0\xa0\xa0\xa0 Historia </option>'
    >>> lst[1].decode('utf8')
    u'<option value="393">\xa0\xa0\xa0\xa0 H\xe4lsa & sk\xf6nhet </option>'