Python's default encoding got me confused.
There is an á character in a text file's content. The file is saved as UTF-8 in notepad. When I don't specify encoding='utf-8' in:
with open(filename,encoding='utf-8') as f:
for line in f:
print(line)
it shows up as á. When I do add the encoding='utf-8' part it shows up as á.
I am wondering what sys.getdefaultencoding() is useful for, as this shows utf-8, but I still had to specify utf-8 as encoding for the á to show up in the output.
I'm using Python3.
Extra edit:
The encoding that is used is probably latin-1 extended I think. Since: á in utf-8 maps to 0xC3 0xA1 and in latin-1 extended: 0xC3 maps to à 0xA1 maps to ¡
How could I verify that latin-1 extended will be used when not specifying encoding?
Read the docs in Built-in Functions -> open():
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
…
In text mode, if encoding is not specified the encoding used is platform dependent:locale.getpreferredencoding(False)
is called to get the current locale encoding.
…
where locale.getpreferredencoding(do_setlocale=True)
Return the encoding used for text data, according to user preferences.
sys.getdefaultencoding()
is different (and independent):
Return the name of the current default string encoding used by the Unicode implementation.