Search code examples
pythonunicodepylint

Pylint fails due to UnicodeError


I'm using Pylint to check my code when I do commits. Recently, I've had a commit fail because of the following error:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 1699-1713: character maps to <undefined>

Here's the traceback:

Traceback (most recent call last):
  File "\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "\venv\tso_ingestion\Scripts\pylint.EXE\__main__.py", line 7, in <module>
  File "\venv\tso_ingestion\lib\site-packages\pylint\__init__.py", line 36, in run_pylint
    PylintRun(argv or sys.argv[1:])
  File "\venv\tso_ingestion\lib\site-packages\pylint\lint\run.py", line 213, in __init__
    linter.check(args)
  File "\venv\tso_ingestion\lib\site-packages\pylint\lint\pylinter.py", line 701, in check
    with self._astroid_module_checker() as check_astroid_module:
  File "\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 142, in __exit__
    next(self.gen)
  File "\venv\tso_ingestion\lib\site-packages\pylint\lint\pylinter.py", line 1010, in _astroid_module_checker
    checker.close()
  File "\venv\tso_ingestion\lib\site-packages\pylint\checkers\similar.py", line 875, in close
    self.add_message("R0801", args=(len(couples), "\n".join(msg)))
  File "\venv\tso_ingestion\lib\site-packages\pylint\checkers\base_checker.py", line 164, in add_message
    self.linter.add_message(
  File "\venv\tso_ingestion\lib\site-packages\pylint\lint\pylinter.py", line 1323, in add_message
    self._add_one_message(
  File "\venv\tso_ingestion\lib\site-packages\pylint\lint\pylinter.py", line 1281, in _add_one_message
    self.reporter.handle_message(
  File "\venv\tso_ingestion\lib\site-packages\pylint\reporters\text.py", line 208, in handle_message
    self.write_message(msg)
  File "\venv\tso_ingestion\lib\site-packages\pylint\reporters\text.py", line 201, in write_message
    self.writeln(self._fixed_template.format(**self_dict))
  File "\venv\tso_ingestion\lib\site-packages\pylint\reporters\base_reporter.py", line 64, in writeln
    print(string, file=self.out)
  File "\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1699-1713: character maps to <undefined>

The only changes I could see that could possible result in an encoding error was refactoring a set of asserts that looked like this:

    assert e_info.type is PageException
    assert e_info.value.args[0].url == url
    assert e_info.value.args[0].body == soup.body
    assert e_info.value.args[0].element == soup.body.find("form", id="form").find(
        "a", string="料金通知情報一覧"
    )
    assert (
        "Onclick event associated with \\'料金通知情報一覧\\' link was missing or malformed"
        in str(e_info.value.args[0])
    )

to look like this:

    check_page_exception(
        e_info,
        url,
        soup.body,
        soup.body.find("form", id="form").find("a", string="料金通知情報一覧"),
        "Onclick event associated with \\'料金通知情報一覧\\' link was missing or malformed",
    )

There are Unicode characters in here but they've only moved around so I don't see how this could be causing the error. Does anyone know how to fix this?


Solution

  • Thanks to a comment from @KlausD., I was able to diagnose and fix the issue. Apparently, the problem was that my shell was set to the ANSI character set. While I've never had a problem with this before, Pylint was using the shell's default character set to print error messages. Although much of my code involves text written in Japanese and Pylint had certainly thrown errors before, none of those errors involved actually printing the text. The answer provided here fixed this issue.