Search code examples
pythonpython-3.xencodingutf-8

Why this program ( open encodings utf-8 utf-8-sig ) fails in some context, not in other context


Why, with this program :

import sys

print("sys.getdefaultencoding()='%s'" % (sys.getdefaultencoding(), ))

with open("example.txt", "w", encoding="utf-8-sig", errors="replace") as f:
    f.write("test;Ilość sztuk\n")

with open("example.txt", "r", errors="strict") as rf:
    lr = rf.readline()
    print("lr=", lr)

run OK in some context, and failed in other context.

example OK :

$ python3 ./example.py 
sys.getdefaultencoding()='utf-8'
lr= test;Ilość sztuk

note :

$ python3 --version
Python 3.6.8

example KO :

sys.getdefaultencoding()='utf-8'
Traceback (most recent call last):
  File "./example.py", line 9, in <module>
    lr = rf.readline()
  File "/.../python/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)
$

note :

$ python3 --version
Python 3.6.8

Contexts are ; Ubuntu 19.04, Ubuntu 18.04, Debian 9, in chroot, outside chroot, LANG is "en_US.UTF-8" or "fr_FR.UTF-8", no impact on success or failed

In all case, Python is install by hand with same option.

If you need value of some environment variable, I can give it.

I search to have exact same execution in all case.


Solution

  • In Python 3, there are different encoding defaults. The one you found, sys.getdefaultencoding(), tells you the default for the methods str.encode() and bytes.decode(). As far as I know, it's always UTF-8, no matter what build or implementation of Python you use.

    However, if you omit the encoding=... parameter in a call to open(), then locale.getpreferredencoding() is used; also for sys.stdin, sys.stdout (print()!), sys.stderr. The value of this default depends on the environment in which the Python interpreter is started. The details of how this value is determined varies between platforms, but often you can achieve the desired behaviour by setting the PYTHONIOENCODING env variable. As of Python 3.7, you can launch Python with -X utf8 to enable UTF-8 mode.