I'm working with Visual Studio Code under Lubuntu 18.04. The file encoding in VS Code is configured to be UTF-8, and the Python scripts have the encoding set to utf-8:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
The Python files contain some non-ASCII characters like in this example docstring:
"""
'Al final pudimos reparar el problema de registro de datos y se pudieron montar los
equipos para recoger algún dato más. ...'
"""
If executing the scripts I get the following error:
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xfa in position 141: invalid start byte
Here is the traceback:
Traceback (most recent call last):
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/USERNAME/.vscode/extensions/ms-python.python-2020.6.90262/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
cli.main()
File "/home/USERNAME/.vscode/extensions/ms-python.python-2020.6.90262/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/home/USERNAME/.vscode/extensions/ms-python.python-2020.6.90262/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 267, in run_file
runpy.run_path(options.target, run_name=compat.force_str("__main__"))
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/runpy.py", line 261, in run_path
code, fname = _get_code_from_file(run_name, path_name)
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/runpy.py", line 236, in _get_code_from_file
code = compile(f.read(), fname, 'exec')
File "/home/USERNAME/Desktop/Python/Scripts/General/Import_export/import_EXCEL_spreadsheet_data_write_to_CSV.py", line 338
None of the numerous proposals worked for me, since this error is thrown when executing any Python script containing non-ASCII characters even in comments or docstrings.
Finally, I found the cause of the entire problem:
It lied in the settings.json
- file, where autoguess-encoding was set to true
:
"files.autoGuessEncoding": true
This option is able to override "files.encoding": "utf8"
, so even if you have defined a preferred encoding, VS Code
is capable of guessing another encoding.
By virtue of the valuable hint of Brett Cannon I detected that indeed in the bottom right corner of VS Code the file's encoding was sometimes (automatically) put to Windows 1252
. This unfortunate guess of VS Code
's option "files.autoGuessEncoding": true
led to the common errors mentioned above in my initial question (provided that I inserted Umlauts ("äöü..") or diacritics ("éúá..") somewhere in my script):
"error while code parsing: Wrong or no encoding specified for script.py."
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xfa in position 141: invalid start byte
As stated in the discussion related to the aforementioned link, VS Code
is still somewhat inaccurate when it comes to detecting the adequate file-encoding, which I can confirm.
To resolve this problem at last, avoid autodetection by putting the following 2 lines in your settings.json
(or set the associated options in the settings-GUI of VS Code
):
{...,
"files.encoding": "utf8",
"files.autoGuessEncoding": false,
...
}
Now, it is possible to place any character of desire within the text-file or script, such as Umlauts ("äöü..") and diacritics ("éúá..").
Finally, it is noteworthy that the above-mentioned settings won't change the encoding of already previously created and saved files.
For this to happen, you need to left-click on the encoding on the bottom right in the VS Code
window, then either reopen
or save
with your desired encoding, which will most likely be utf8
.
As an aside regarding the settings, note that you can also change these settings via the GUI under File -> Preferences -> Settings
instead of using the settings.json
- file (via Ctrl + Shift + P
and then "Preferences: Open Settings (JSON)"
.