Search code examples
python-3.xnumpycythonsetuptools

Add numpy.get_include() argument to setuptools without preinstalled numpy


I am currently developing a python package that uses cython and numpy and I want the package to be installable using the pip install command from a clean python installation. All dependencies should be installed automatically. I am using setuptools with the following setup.py:

import setuptools

my_c_lib_ext = setuptools.Extension(
    name="my_c_lib",
    sources=["my_c_lib/some_file.pyx"]
)

setuptools.setup(
    name="my_lib",
    version="0.0.1",
    author="Me",
    author_email="[email protected]",
    description="Some python library",
    packages=["my_lib"],
    ext_modules=[my_c_lib_ext],
    setup_requires=["cython >= 0.29"],
    install_requires=["numpy >= 1.15"],
    classifiers=[
        "Programming Language :: Python :: 3",
        "Operating System :: OS Independent"
    ]
)

This has worked great so far. The pip install command downloads cython for the build and is able to build my package and install it together with numpy.

Now I want to improve the performance of my cython code, which leads to some changes in my setup.py. I need to add include_dirs=[numpy.get_include()] to either the call of setuptools.Extension(...) or setuptools.setup(...) which means that I also need to import numpy. (See http://docs.cython.org/en/latest/src/tutorial/numpy.html and Make distutils look for numpy header files in the correct place for rationals.)

This is bad. Now the user cannot call pip install from a clean environment, because import numpy will fail. The user needs to pip install numpy before installing my library. Even if I move "numpy >= 1.15" from install_requires to setup_requires the installation fails, because the import numpy is evaluated earlier.

Is there a way to evaluate the include_dirs at a later point of the installation, for example, after the dependencies from setup_requires or install_requires have been resolved? I really like to have all dependencies resolved automatically and I dont want the user to type multiple pip install commands.

The following snippet works, but it is not officially supported because it uses an undocumented (and private) method:

class NumpyExtension(setuptools.Extension):
    # setuptools calls this function after installing dependencies
    def _convert_pyx_sources_to_lang(self):
        import numpy
        self.include_dirs.append(numpy.get_include())
        super()._convert_pyx_sources_to_lang()

my_c_lib_ext = NumpyExtension(
    name="my_c_lib",
    sources=["my_c_lib/some_file.pyx"]
)

The article How to Bootstrap numpy installation in setup.py proposes using a cmdclass with custom build_ext class. Unfortunately, this breaks the build of the cython extension because cython also customizes build_ext.


Solution

  • First question, when is numpy needed? It is needed during the setup (i.e. when build_ext-funcionality is called) and in the installation, when the module is used. That means numpy should be in setup_requires and in install_requires.

    There are following alternatives to solve the issue for the setup:

    1. using PEP 517/518 (which is more straight forward IMO)
    2. using setup_requires-argument of setup and postponing import of numpy until setup's requirements are satisfied (which is not the case at the start of setup.py's execution)

    PEP 517/518-solution:

    Put next to setup.py a pyproject.toml-file , with the following content:

    [build-system]
    requires = ["setuptools", "wheel", "Cython>=0.29", "numpy >= 1.15"]
    

    which defines packages needed for building, and then install using pip install . in the folder with setup.py. A disadvantage of this method is that python setup.py install no longer works, as it is pip that reads pyproject.toml. However, I would use this approach whenever possible.


    Postponing import

    This approach is more complicated and somewhat hacky, but works also without pip.

    First, let's take a look at unsuccessful tries so far:

    pybind11-trick @chrisb's "pybind11"-trick, which can be found here: With help of an indirection, one delays the call to import numpy until numpy is present during the setup-phase, i.e.:

    class get_numpy_include(object):
    
        def __str__(self):
            import numpy
            return numpy.get_include()
    ...
    my_c_lib_ext = setuptools.Extension(
        ...
        include_dirs=[get_numpy_include()]
    )
    

    Clever! The problem: it doesn't work with the Cython-compiler: somewhere down the line, Cython passes the get_numpy_include-object to os.path.join(...,...) which checks whether the argument is really a string, which it obviously isn't.

    This could be fixed by inheriting from str, but the above shows the dangers of the approach in the long run - it doesn't use the designed mechanics, is brittle and may easily fail in the future.

    the classical build_ext-solution

    Which looks as following:

    ...
    from setuptools.command.build_ext import build_ext as _build_ext
    
    class build_ext(_build_ext):
        def finalize_options(self):
            _build_ext.finalize_options(self)
            # Prevent numpy from thinking it is still in its setup process:
            __builtins__.__NUMPY_SETUP__ = False
            import numpy
            self.include_dirs.append(numpy.get_include())
    
    setupttools.setup(
        ...
        cmdclass={'build_ext':build_ext},
        ...
    )
    

    Yet also this solution doesn't work with cython-extensions, because pyx-files don't get recognized.

    The real question is, how did pyx-files get recognized in the first place? The answer is this part of setuptools.command.build_ext:

    ...
    try:
        # Attempt to use Cython for building extensions, if available
        from Cython.Distutils.build_ext import build_ext as _build_ext
        # Additionally, assert that the compiler module will load
        # also. Ref #1229.
        __import__('Cython.Compiler.Main')
    except ImportError:
        _build_ext = _du_build_ext
    ...
    

    That means setuptools tries to use the Cython's build_ext if possible, and because the import of the module is delayed until build_ext is called, it founds Cython present.

    The situation is different when setuptools.command.build_ext is imported at the beginning of the setup.py - the Cython isn't yet present and a fall back without cython-functionality is used.

    mixing up pybind11-trick and classical solution

    So let's add an indirection, so we don't have to import setuptools.command.build_ext directly at the beginning of setup.py:

    ....
    # factory function
    def my_build_ext(pars):
         # import delayed:
         from setuptools.command.build_ext import build_ext as _build_ext#
     
         # include_dirs adjusted: 
         class build_ext(_build_ext):
             def finalize_options(self):
                 _build_ext.finalize_options(self)
                 # Prevent numpy from thinking it is still in its setup process:
                 __builtins__.__NUMPY_SETUP__ = False
                 import numpy
                 self.include_dirs.append(numpy.get_include())
         
        #object returned:
        return build_ext(pars)
    ...
    setuptools.setup(
        ...
        cmdclass={'build_ext' : my_build_ext},
        ...
    )