Search code examples
pythonutf-8convertersiso-8859-1

How to convert ISO-8859-1 to UTF-8 using Python 3.7.4


How can I convert text in ISO-8859-1/latin1 to UTF-8 using Python 3.7.4 (32-bit)?

This is what I tried:

>>> inputText = "\xC4pple"
>>> inputText.decode('iso-8859-1').encode('utf8')

And it returned this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'

What am I doing wrong?


Solution

  • decode is a member of the bytes type:

    >>> help(bytes.decode)
    Help on method_descriptor:
    
    decode(self, /, encoding='utf-8', errors='strict')
        Decode the bytes using the codec registered for encoding.
        
        encoding
          The encoding with which to decode the bytes.
        errors
          The error handling scheme to use for the handling of decoding errors.
          The default is 'strict' meaning that decoding errors raise a
          UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
          as well as any other name registered with codecs.register_error that
          can handle UnicodeDecodeErrors.
    

    So inputText needs to be of type bytes, not str:

    >>> inputText = b"\xC4pple"
    >>> inputText.decode('iso-8859-1')
    'Äpple'
    >>> inputText.decode('iso-8859-1').encode('utf8')
    b'\xc3\x84pple'
    

    Note that the result of decode is type str and of encode is type bytes.