Search code examples
pythonutf-8pycharm

UnicodeDecodeError: 'utf-8' when debugging Python files in PyCharm Community


Current conclusion:

The encoding of the converted file is utf-8->utf-8 big->ansi -> utf-8. Reopen the file after each conversion.

After observing for a period of time, there is no such error.


When I use PyCharm to debug .py files, the same file sometimes has UnicodeDecodeError, sometimes it’s normal. My operating system is Windows 10, PyCharm version is 2020.3.3 Community edition.

The error is as follows:

Traceback (most recent call last):
  File "D:\Program Files\JetBrains\PyCharm Community Edition 2020.3.3\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 301, in _on_run
    r = r.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 1022-1023: unexpected end of data

I tried to add the following code to the header of the file, but sometimes I still get an error, how to solve it?

#!/usr/bin/env Python
# coding=utf-8

I found another way to save as a UTF-8 document with Notepad. I tried it, but there are still errors sometimes.


Solution

  • There isn't one single answer to the problem as it is described in the question. A number of issues can cause the indicated error, so it's best to address the several possible factors in the context of the PyCharm IDE.

    1. Every Python file .py (or any other file for that matter) has an encoding. The default encoding of a .py source code file is Unicode UTF-8. This problem is frequently faced by beginners, so lets pinpoint the relevant quotes from the official documentation (to shorten any unnecessary reading time):

      Python’s Unicode Support

      The default encoding for Python source code is UTF-8, so you can simply include a Unicode character in a string literal.

      This means in most circumstances you shouldn't need the encoding string, see Python Source Code Encodings - PEP 263. Current practice is having the source files encoded by default in UTF-8 and omitting the encoding string at the top of the module (this is also more concise).

    2. The PyCharm IDE has a number of encoding configurations that can successively be refined, going from global, to project, to file path. By default, everything should be set to UTF-8, especially the source code. See the official PyCharm documentation Configure file encoding settings.

    3. The exception to the above should be if you are processing external data files, in which case your source code should still be kept as UTF-8 and the data file opened with whatever encoding it requires. Most questions about the UnicodeDecodeError are about specifying the right file encoding when using the open() function to open some data file (they are not about the encoding of the source files where you are writing your code).

    4. When your source files cause this error, a frequent cause is after copy-pasting, or opening, a source code file that is not encoded in UTF-8. (The copy-paste is especially unexpected, when you copy from a file that isn't encoded in UTF-8 and the IDE doesn't automatically convert what you are copy-pasting into the editor). This can cause the said error. So you should narrow down which source code file has the encoding that isn't UTF-8 and convert it.

    We don't have access to your project files, but the error message to me reads as the debugger trying to open a user source code file that isn't encoded in UTF-8, contrary to the IDE configurations and module encoding.

    File "D:\Program Files\JetBrains\PyCharm Community Edition 2020.3.3\plugins\python-ce\helpers\pydev_pydevd_bundle\pydevd_comm.py"