Search code examples
unicodevisual-studio-codediacriticsnon-ascii-characters

Textfile saved in wrong encoding in VS Code (on Ubuntu) leading to unicode error


I'm working with Visual Studio Code under Lubuntu 18.04. The file encoding in VS Code is configured to be UTF-8, and the Python scripts have the encoding set to utf-8:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

The Python files contain some non-ASCII characters like in this example docstring:

"""
'Al final pudimos reparar el problema de registro de datos y se pudieron montar los
equipos para recoger algún dato más. ...'
"""

If executing the scripts I get the following error:

SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xfa in position 141: invalid start byte

Here is the traceback:

Traceback (most recent call last):
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/USERNAME/.vscode/extensions/ms-python.python-2020.6.90262/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/home/USERNAME/.vscode/extensions/ms-python.python-2020.6.90262/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/home/USERNAME/.vscode/extensions/ms-python.python-2020.6.90262/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 267, in run_file
    runpy.run_path(options.target, run_name=compat.force_str("__main__"))
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/runpy.py", line 261, in run_path
    code, fname = _get_code_from_file(run_name, path_name)
  File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/runpy.py", line 236, in _get_code_from_file
    code = compile(f.read(), fname, 'exec')
  File "/home/USERNAME/Desktop/Python/Scripts/General/Import_export/import_EXCEL_spreadsheet_data_write_to_CSV.py", line 338

None of the numerous proposals worked for me, since this error is thrown when executing any Python script containing non-ASCII characters even in comments or docstrings.


Solution

  • Finally, I found the cause of the entire problem: It lied in the settings.json - file, where autoguess-encoding was set to true:

    "files.autoGuessEncoding": true

    This option is able to override "files.encoding": "utf8", so even if you have defined a preferred encoding, VS Code is capable of guessing another encoding. By virtue of the valuable hint of Brett Cannon I detected that indeed in the bottom right corner of VS Code the file's encoding was sometimes (automatically) put to Windows 1252. This unfortunate guess of VS Code's option "files.autoGuessEncoding": true led to the common errors mentioned above in my initial question (provided that I inserted Umlauts ("äöü..") or diacritics ("éúá..") somewhere in my script):

    1. Getting the error message in pylint right after insertion: "error while code parsing: Wrong or no encoding specified for script.py."
    2. Next, running the script produces the mentioned SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xfa in position 141: invalid start byte

    As stated in the discussion related to the aforementioned link, VS Code is still somewhat inaccurate when it comes to detecting the adequate file-encoding, which I can confirm.

    To resolve this problem at last, avoid autodetection by putting the following 2 lines in your settings.json (or set the associated options in the settings-GUI of VS Code):

    {...,
    
        "files.encoding": "utf8",
        "files.autoGuessEncoding": false,
    ...
    
    }
    

    Now, it is possible to place any character of desire within the text-file or script, such as Umlauts ("äöü..") and diacritics ("éúá..").

    Finally, it is noteworthy that the above-mentioned settings won't change the encoding of already previously created and saved files. For this to happen, you need to left-click on the encoding on the bottom right in the VS Code window, then either reopen or save with your desired encoding, which will most likely be utf8.

    As an aside regarding the settings, note that you can also change these settings via the GUI under File -> Preferences -> Settings instead of using the settings.json - file (via Ctrl + Shift + P and then "Preferences: Open Settings (JSON)".