Search code examples
pythonwandb

Unexpected chardet depedency with wandb, how to fix and why does it happen?


I got this error:

import wandb

...

Traceback (most recent call last):
  File "/lfs/ampere1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/requests/compat.py", line 11, in <module>
    import chardet
ModuleNotFoundError: No module named 'chardet'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/__init__.py", line 52, in <module>
    from uutils.logging_uu.wandb_logging.common import setup_wandb
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/logging_uu/wandb_logging/common.py", line 8, in <module>
    import wandb
  File "/lfs/ampere1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/wandb/__init__.py", line 26, in <module>
    from wandb import sdk as wandb_sdk
  File "/lfs/ampere1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/wandb/sdk/__init__.py", line 3, in <module>
    from . import wandb_helper as helper  # noqa: F401
  File "/lfs/ampere1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/wandb/sdk/wandb_helper.py", line 6, in <module>
    from .lib import config_util
  File "/lfs/ampere1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/wandb/sdk/lib/config_util.py", line 10, in <module>
    from wandb.util import load_yaml
  File "/lfs/ampere1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/wandb/util.py", line 49, in <module>
    import requests
  File "/lfs/ampere1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/requests/__init__.py", line 45, in <module>
    from .exceptions import RequestsDependencyWarning
  File "/lfs/ampere1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/requests/exceptions.py", line 9, in <module>
    from .compat import JSONDecodeError as CompatJSONDecodeError
  File "/lfs/ampere1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/requests/compat.py", line 13, in <module>
    import charset_normalizer as chardet
  File "/lfs/ampere1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/charset_normalizer/__init__.py", line 23, in <module>
    from charset_normalizer.api import from_fp, from_path, from_bytes, normalize
  File "/lfs/ampere1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/charset_normalizer/api.py", line 10, in <module>
    from charset_normalizer.md import mess_ratio
  File "charset_normalizer/md.py", line 5, in <module>
ImportError: cannot import name 'COMMON_SAFE_ASCII_CHARACTERS' from 'charset_normalizer.constant' (/lfs/ampere1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/charset_normalizer/constant.py)
(data_quality) brando9~ $ pip install chardet

Collecting chardet
  Downloading chardet-5.1.0-py3-none-any.whl (199 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.1/199.1 kB 7.7 MB/s eta 0:00:00
Installing collected packages: chardet
Successfully installed chardet-5.1.0

why? seems odd. Thought this fixed it it seems:

pip install --upgrade pip
pip install chardet  # might be needed if wandb acts weird

fixed it. But why did this happen in the first place?

https://community.wandb.ai/t/odd-error-needing-chardet-with-wandb/4719


Edit

idk why this is needed but this seems to work:

pip install --upgrade pip
pip install wandb --upgrade
if ! pip show chardet > /dev/null; then
    pip install chardet
fi
if ! pip show cchardet > /dev/null; then
    pip install cchardet
fi
python -c "import uutils; uutils.torch_uu.gpu_test()"

shouldn't be happening in the first place...


Solution

  • This is possibly related to a mishappening between pip and the conda package, as a closed GitHub issue would suggest:

    https://github.com/Ousret/charset_normalizer/issues/278

    The maintainer says:

    Yes, you have 3.1.0 installed, but for some reason, it is trying to import charset_normalizer.constant from the v2.0.4 (conda original depo) which does not have the constant COMMON_SAFE_ASCII_CHARACTERS in it.

    In python3.10/site-packages/requests/compat.py you have:

    try:
        import chardet
    except ImportError:
        import charset_normalizer as chardet
    

    That's why installing chardet fixed your issue. Installing the current version of charset_normalizer (3.2.0) with pip should have worked as well.

    See also here, here and another discussion on SO here

    Some people have suggested, that it may be the fault of the charset_normalizer 3.1.0 pip release, but at least now at the time of checking it, I cannot confirm that.

    COMMON_SAFE_ASCII_CHARACTERS is present in constant.py at least for pip versions 2.1.0, 3.1.0 and 3.2.0

    It seems that under circumstances conda installs a very old 2.0.4 charset_normalizer, but shows a wrong version.

    Edit (as per request IN comments):

    Since the question was why this is happening and you asked for the way to come to some kind of conclusion, it makes sense to first look at the provided exception message.

    1. We can see two Tracebacks, both originating from python3.10/site-packages/requests/compat.py. Above I've included the relevant part of the code: it shall try to import chardet and in case of an ImportError import charset_normalizer instead. If that fails as well, the error shall be thrown, as the code can indeed not run successfully. Requests is a dependency of wandb.

    The ModuleNotFoundError is a built-in subclass of ImportError and was thrown, as you didn't have chardet installed at that point.

    1. The second Traceback occurs due to the unsuccessful attempt to import charset_normalizer. In order to understand why that happened even though you had a version of it installed, we can follow the traceback all the way to the last line, where this should have been done:

      from charset_normalizer.md import mess_ratio

    In charset_normalizer/md.py we can find:

    from functools import lru_cache
    from logging import getLogger
    from typing import List, Optional
    
    from .constant import (
        COMMON_SAFE_ASCII_CHARACTERS,
        ...
    

    And with that we finally take a look into charset_normalizer/constant.py only to indeed find COMMON_SAFE_ASCII_CHARACTERS defined there:

    COMMON_SAFE_ASCII_CHARACTERS: Set[str] = {
        "<",
        ">",
        "=",
        ...
    

    And despite clearly being present, your error message insisted:

    ImportError: cannot import name 'COMMON_SAFE_ASCII_CHARACTERS' from 'charset_normalizer.constant'
      
    
    1. There are a bunch of possible explanations for that, but the most likely would be a different version of charset_normalizer or a botched install. Taking a look at recent versions of that package, as well as open issues on GitHub and we find that this name was introduced back in Sep 2019 with 2.0.7. as in the diff.

    2. The way your presented your question and seeing you're on python 3.10 made it seem unlikely, that you wouldn't take care of updating your packages when needed, so this possibility was ignored. Thus I looked into the closed issues and found what looked very similar to your situation. A google search revealed that all kinds of other packages using charset_normalizer had some people experiencing the same issue. One person interestingly wrote they had the issue in one conda env but not in another. This is a relevant clue.

    3. People gave each other the tip to use pip to install charset_normalizer 2.1.0 or simply chardet and it solved their issue. Then, checking the conda packages showed, that they're fine as well. But those are the conda-forge (the community channel of conda) packages and if we look at search results for charset-normalizer we can see that the default channel provides the outdated 2.0.4 version instead (the one that does not include the name requests tries to import.

    So, what happened? It's actually impossible to conclude with certainty, but it seems pretty clear that you had previously gotten charset_normalizer==2.0.4 from the conda default (main) channel. Due to the (previous) absence of chardet and the old version of charset_normalizer and since requests is still fine with charset_normalizer 2.0.0 (for relevant reasons regarding newer version, as a quick look at their issue-discussions seem to suggest) - the error. Additionally, you may have run into an issue mixing pip and conda or even some undetected collision. That is, other people with the same error were shown newer charset_normalizer versions, while the old default conda 2.0.4 version was still present.

    Generally, I'd say this is something that happens and should be solved pragmatically - just as you did with installing chardet. And possibly avoid the default conda channel (if you used it in the past) and use the conda-forge channel, as it seems that a community is more capable of maintaining packages than any singular entity (who would have thought). If you want to dig deeper, you'd have to look at the developing relation between chardet, charset_normalizer and requests. I'm certain there is a rich and interesting history

    Since you already solved the practical issue, just a few short notes on what else you may have tried (no guarantees, as I don't know specifics about your environment):

    • just use a new environment, make sure to set conda channel_priority to flexible and conda-forge as your channel of choice
    • remove charset_normalizer and re-install from conda-forge channel or with pip