Search code examples
pythonpydev

Non UTF-8 encoded character in PyDev breaks in Debug Mode


Just wanted this to show upon google in case any other person runs into it as I spent 8 hours uninstalling re-installing to get it working.

The python script would run fine but in Debug mode it would blow up and act very erratic not going into the method with the thorn character.

pydev debugger: starting (pid: 8612)
Traceback (most recent call last):
  File "C:\Users\RH1832\.p2\pool\plugins\org.python.pydev_6.2.0.201711281614\pysrc\pydevd.py", line 1621, in <module>
    main()

File "C:\Users\Ryan\.p2\pool\plugins\org.python.pydev_6.2.0.201711281614\pysrc\pydevd.py", line 1615, in main
    globals = debugger.run(setup['file'], None, None, is_module)

File "C:\Users\Ryan\.p2\pool\plugins\org.python.pydev_6.2.0.201711281614\pysrc\pydevd.py", line 1022, in run
    pydev_imports.execfile(file, globals, locals)  
 execute the script
File "C:\Users\Ryan\.p2\pool\plugins\org.python.pydev_6.2.0.201711281614\pysrc\_pydev_imps\_pydev_execfile.py", line 20, in execfile
    contents = stream.read()

File "C:\Users\Ryan\Python3\env\lib\codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfe in position 2151: invalid start byte

Solution

  • The general problem is that you actually need an encoding declaration in any source file with non-ASCII characters—or, if you're in 3.6+, any source file with non-ASCII characters that's not UTF-8.

    Your source code appears to be in Latin-1 (otherwise the thorn would be the two bytes \xc3\xbe instead of the one byte \xfe), so it's illegal. (I'm sure you know the simple answer—save your files as UTF-8 rather than Latin-1—as well as the even simpler answer—don't put non-ASCII characters in comments. The hard part is how to find the problem in the first place, not how to work around it once you've found it.)

    The way the CPython tokenizer works, it will often not notice illegal characters if they appear only in comments. So a module may import fine, or a script may execute fine, until it needs to generate a string out of the line of source code with that comment for some reason. I don't know the internals of PyDev very well, but presumably, PyDev is asking for that source line—unless it's doing the equivalent manually.

    Ideally, you should get this error every time you run the script or import the module, not only when you're deep inside a debugging session. It may turn out that there's a good reason Python doesn't and shouldn't do that, but you may want to do a bit more research to verify it with the latest version of Python, and that it isn't actually specific to PyDev's debugger, and then file a bug or raise the issue on python-dev. (Or, if it does turn out to be specific to PyDev, file a bug with that project instead.)