Search code examples
pythonpython-3.xpipsetuptools

oddities when pip installing a local package with --prefix and -e. Potential bug with importlib.metadata not following .egg-link to .egg-info properly


Update By all accounts it appears python 3.8-3.10 is not following the .egg-link or easy-install.pth stubs properly to grab the .egg-info metadata. Not sure why. Tried installing python 3.10.1 with brew and that too has issues with importlib.metadata following the .egg-link or easy-install.pth file properly to find .egg-info metadata, despite the .egg-link and easy-install.pth being in the $PYTHONPATH

Background: Our CentOS 8 servers at work have python 3.6.8 installed (with pip 9.0.3). When working on a project we use the modules utility to load specific versions of programs, including python 3.8.3 (with pip 20.2.2). Under the project directory is its own bin/, lib/, etc. This allows us to install project specific python packages to these project dirs. Among them is an internally developed package that we use to manage our projects with the help of a console_scripts entry point into this package. This internally developed package is under VCS with git and could be edited during the lifetime of the project. So when working within the context of this project, we want to be able to edit the source code of this python package while having it installed locally so that its console script can be used. This is just the use case for pip install --prefix project_dir -e pkg_src_dir

The problem is, this works fine with python 3.6.8 but not with python 3.8.3, which is what we actually use for our projects. And I'm not sure if it's a bug with the particular version of importlib.metadata including with python 3.8.3.

I created a dummy Hello World package to try and debug this. mypkg.py defines one function that prints Hello World. main.py's main() function calls mypkg's Hello World printing function. Simple and this structure follows the python.org's own packaging tutorial.

mypkg/
├── setup.py
└── src/
   └── mypkg/
      ├── __init__.py
      ├── mypkg.py
      └── __main__.py

With python 3.6.8 and its pip 9.0.3, pip install --prefix project_dir -e mypkg works just as you'd expect. project_dir/lib/python-3.6.8/site-packages contains the mypkg.egg-link file that points to the mypkg/src directory. In project_dir/bin is the mypkg console script.

#!/usr/bin/python3.6
# EASY-INSTALL-ENTRY-SCRIPT: 'mypkg','console_scripts','mypkg'
__requires__ = 'mypkg'
import re
import sys
from pkg_resources import load_entry_point

if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
    sys.exit(
        load_entry_point('mypkg', 'console_scripts', 'mypkg')()
    )

By prepending the project_dir/lib/python-3.6/site-packages dir to the $PYTHONPATH I am able to run this console script mypkg without issue and have it print Hello World. I can even run this console script with python 3.8.3 by running it directly with that version of python, python-3.8.3 ./mypkg. This is because, as I would later discover, because it's using the older load_entry_point function from pkg_resources and not the newer version from importlib.metadata.

However, if I try installing that same package with python 3.8.3 in the exact same way, the console script fails to run. This after updating the $PYTHONPATH to project_dir/lib/python-3.8/site-packages.

Traceback (most recent call last):
  File "./mypkg", line 33, in <module>
    sys.exit(load_entry_point('mypkg', 'console_scripts', 'mypkg')())
  File "./mypkg", line 22, in importlib_load_entry_point
    for entry_point in distribution(dist_name).entry_points
  File "/tools/conda/anaconda3/2020.07/lib/python3.8/importlib/metadata.py", line 504, in distribution
    return Distribution.from_name(distribution_name)
  File "/tools/conda/anaconda3/2020.07/lib/python3.8/importlib/metadata.py", line 177, in from_name
    raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: mypkg

The console script stub is considerably different, with those changes related to the use of importlib.metadata to provide the load_entry_point function.

#!/tools/conda/anaconda3/2020.07/bin/python
# EASY-INSTALL-ENTRY-SCRIPT: 'mypkg','console_scripts','mypkg'
import re
import sys

# for compatibility with easy_install; see #2198
__requires__ = 'mypkg'

try:
    from importlib.metadata import distribution
except ImportError:
    try:
        from importlib_metadata import distribution
    except ImportError:
        from pkg_resources import load_entry_point


def importlib_load_entry_point(spec, group, name):
    dist_name, _, _ = spec.partition('==')
    matches = (
        entry_point
        for entry_point in distribution(dist_name).entry_points
        if entry_point.group == group and entry_point.name == name
    )
    return next(matches).load()


globals().setdefault('load_entry_point', importlib_load_entry_point)


if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
    sys.exit(load_entry_point('mypkg', 'console_scripts', 'mypkg')())

The fascinating thing is that this console script works when run directly with the python 3.6.8 binary. EDIT: which makes sense since it falls back to loading the older pkg_resources version of load_entry_point, because of all of the trys around the imports This despite the two installations sharing the same local paths in their sys.path search (i.e. project_dir/lib/python-3.8/site-packages). Only their system/installation specific paths differ, where the local mypkg should not be found.

I also discovered that if I add the from pkg_resources import load_entry_point line from the python 3.6.8 console script to the python 3.8.3 console script, that I no longer get the errors when running that script with python 3.8.3. EDIT: which again makes total sense, as the root of the issue is related to importlib.metadata

Here's my setup.py for full disclosure. I'm not sure if there's something I can add to that to overcome this issue so that python 3.8.3 can run --prefix --editable pip installed packages.

import setuptools

setuptools.setup(
    name="mypkg",
    version="0.1.0",
    entry_points = {
        'console_scripts': ['mypkg=mypkg.__main__:main']
    },
    package_dir={"": "src"},
    packages=setuptools.find_packages(where="src"),
    python_requires=">=3.6",
)

UPDATE So after a looking into this a bit more, it's clear that it's the importlib.metadata module with python 3.8.3 that is causing this issue. It works with the older from pkg_resources import load_entry_point but does not work with from importlib.metadata import distribution. I found out I can get it to work if I add the source mypkg package path to my $PYTHONPATH. Presumably this is because it finds the mypkg.egg-info directory there. But how do I get importlib.metadata to find the mypkg metadata without need to add the original source dir when in editable mode?


Solution

  • For more details on this issue and a workable solution, check out

    https://github.com/python/importlib_metadata/issues/364

    Basically you'll need to create a sitecustomize.py file in your --prefix site-packages directory (which you have also added to your $PYTHONPATH). sitecustomize.py is automatically loaded and can be used to append paths to sys.path

    import os
    import io
    import sys
    
    try:
        names = os.listdir('.')
    except OSError:
        pass
    
    names = [name for name in names if name.endswith(".pth")]
    for name in sorted(names):
        try:
            f = io.TextIOWrapper(io.open_code(name), encoding="utf-8")
        except OSError:
            pass
        with f:
            for n, line in enumerate(f):
                if line.startswith("#"):
                    continue
                if line.strip() == "":
                    continue
                try:
                    if line.startswith(("import ", "import\t")):
                        exec(line)
                        continue
                    line = line.rstrip()
                    if os.path.exists(line):
                        sys.path.append(line)
                except Exception:
                    print("Error processing line {:d} of {}:\n".format(n+1, name),
                          file=sys.stderr)
                    import traceback
                    for record in traceback.format_exception(*sys.exc_info()):
                        for line in record.splitlines():
                            print('  '+line, file=sys.stderr)
                    print("\nRemainder of file ignored", file=sys.stderr)
                    break