How can I convert text in ISO-8859-1/latin1 to UTF-8 using Python 3.7.4 (32-bit)?
This is what I tried:
>>> inputText = "\xC4pple"
>>> inputText.decode('iso-8859-1').encode('utf8')
And it returned this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
What am I doing wrong?
decode
is a member of the bytes
type:
>>> help(bytes.decode)
Help on method_descriptor:
decode(self, /, encoding='utf-8', errors='strict')
Decode the bytes using the codec registered for encoding.
encoding
The encoding with which to decode the bytes.
errors
The error handling scheme to use for the handling of decoding errors.
The default is 'strict' meaning that decoding errors raise a
UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
as well as any other name registered with codecs.register_error that
can handle UnicodeDecodeErrors.
So inputText needs to be of type bytes
, not str
:
>>> inputText = b"\xC4pple"
>>> inputText.decode('iso-8859-1')
'Äpple'
>>> inputText.decode('iso-8859-1').encode('utf8')
b'\xc3\x84pple'
Note that the result of decode
is type str
and of encode
is type bytes
.