Search code examples
pythonunicodeencodingutf-8latin1

Converting utf-8 to latin-1 in Python


I want to do this:

Take the bytes of this utf-8 string:

访视频

Encode those bytes in latin-1 and print the result:

访视频

How do I do this in Python?

# -*- coding: utf-8
s = u'访视频'.encode('latin-1')

Causes this exception:

s = u'访视频'.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-2: ordinal not in range(256)

Solution

  • What you're asking to do is literally impossible. You can't encode those characters to Latin-1, because those characters don't exist in Latin-1.

    To get the output you want, you want to decode the UTF-8 bytes as if they were Latin-1. Like this:

    s = u'访视频'.encode('utf-8').decode('latin-1')
    

    However, your desired output doesn't look like actual Latin-1, because in Latin-1, characters \x86 and \x91 are non-printable, so you're going to get this:

    è®¿è§ é¢
    

    (Notice that space in the middle in place of , and the missing at the end; those are actually invisible control characters, not spaces.)

    It looks like you want a Latin-1 superset, probably Windows codepage 1252. In which case what you really want is:

    s = u'访视频'.encode('utf-8').decode('cp1252')