Search code examples
pythonpython-3.xunicodecodec

Why isn't Python writing characters from Latin Extended-A (UnicodeEncodeError when writing to a file)?


Obligatory intro noting that I've done some research

This seems like it should be straightforward (I am happy to close as a duplicate if a suitable target question is found), but I'm not familiar enough with character encodings and how Python handles them to suss it out myself. At risk of seeming lazy, I will note the answer very well may be in one of the links below, but I haven't yet seen it in my reading.

I've referenced some of the docs: Unicode HOWTO, codecs.py docs

I've also looked at some old highly-voted SO questions: Writing Unicode text to a text file?, Python, Unicode, and the Windows console


Question

Here's a MCVE code example that demonstrates my problem:

with open('foo.txt', 'wt') as outfile:
    outfile.write('\u014d')

The traceback is as follows:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Users\cashamerica\AppData\Local\Programs\Python\Python3\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u014d' in position 0: character maps to <undefined>

I'm confused because the code point U+014D is 'ō', an assigned code point, LATIN SMALL LETTER O WITH MACRON (official Unicode source)

I can even print the the character to the Windows console (but it renders as a normal 'o'):

>>> print('\u014d')
o

Solution

  • You are using cp1252 as the default encoding, which does not include ō.

    Write (and read) your file with explicit encoding:

    with open('foo.txt', 'wt', encoding='utf8') as outfile:
        outfile.write('\u014d')