Search code examples
pythoncharacter-encoding

How is Python's sys.getdefaultencoding() used?


Python's default encoding got me confused.

There is an á character in a text file's content. The file is saved as UTF-8 in notepad. When I don't specify encoding='utf-8' in:

with open(filename,encoding='utf-8') as f:
    for line in f:
        print(line)

it shows up as á. When I do add the encoding='utf-8' part it shows up as á.

I am wondering what sys.getdefaultencoding() is useful for, as this shows utf-8, but I still had to specify utf-8 as encoding for the á to show up in the output.

I'm using Python3.

Extra edit:

The encoding that is used is probably latin-1 extended I think. Since: á in utf-8 maps to 0xC3 0xA1 and in latin-1 extended: 0xC3 maps to à 0xA1 maps to ¡

How could I verify that latin-1 extended will be used when not specifying encoding?


Solution

  • Read the docs in Built-in Functions -> open():

    open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
    


    In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.

    where locale.getpreferredencoding(do_setlocale=True)

    Return the encoding used for text data, according to user preferences.

    sys.getdefaultencoding() is different (and independent):

    Return the name of the current default string encoding used by the Unicode implementation.