Search code examples
pythonbazelxgboost

Bazel rules_python fails to copy xgboost into runfiles directory


I have a toy python project containing the following files:

requirements.txt:

numpy==1.14.3
xgboost==0.71

print_numpy_version.py:

from __future__ import print_function
import numpy
print('numpy version: %s' % numpy.version.version)

print_xgboost_version.py:

from __future__ import print_function
import xgboost
print('xgboost version: %s' % xgboost.__version__)

When I create a virtualenv, install the packages, and run the two programs, everything works exactly as you would expect it to:

$ mkvirtualenv toyvirtualenv

New python executable in /home/username/.virtualenvs/toyvirtualenv/bin/python2.7
Also creating executable in /home/username/.virtualenvs/toyvirtualenv/bin/python
Installing setuptools, pip, wheel...done.
virtualenvwrapper.user_scripts creating /home/username/.virtualenvs/toyvirtualenv/bin/predeactivate
virtualenvwrapper.user_scripts creating /home/username/.virtualenvs/toyvirtualenv/bin/postdeactivate
virtualenvwrapper.user_scripts creating /home/username/.virtualenvs/toyvirtualenv/bin/preactivate
virtualenvwrapper.user_scripts creating /home/username/.virtualenvs/toyvirtualenv/bin/postactivate
virtualenvwrapper.user_scripts creating /home/username/.virtualenvs/toyvirtualenv/bin/get_env_details

(toyvirtualenv) $ pip install -r requirements.txt

Collecting numpy==1.14.3 (from -r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/b8/97/ecff917542e3a8a33bc8e88c031ed50c90577fd205eab362b29f3e57c09e/numpy-1.14.3-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting xgboost==0.71 (from -r requirements.txt (line 2))
Collecting scipy (from xgboost==0.71->-r requirements.txt (line 2))
  Using cached https://files.pythonhosted.org/packages/d1/d6/3eac96ffcf7cbeb37ed72982cf3fdd3138472cb04ab32cdce1f444d765f2/scipy-1.1.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Installing collected packages: numpy, scipy, xgboost
Successfully installed numpy-1.14.3 scipy-1.1.0 xgboost-0.71

(toyvirtualenv) $ python print_numpy_version.py; python print_xgboost_version.py

numpy version: 1.14.3
xgboost version: 0.71

Now, I want to Bazelify my project, so I create a WORKSPACE and BUILD files with the following content:

WORKSPACE:

http_archive(
    name = "io_bazel_rules_python",
    strip_prefix = "rules_python-master",
    urls = ["https://github.com/bazelbuild/rules_python/archive/master.zip"],
)

load("@io_bazel_rules_python//python:pip.bzl", "pip_import")

pip_import(
    name = "deps",
    requirements = "//:requirements.txt",
)

load("@deps//:requirements.bzl", "pip_install")

pip_install()

BUILD:

load("@deps//:requirements.bzl", "requirement")

py_binary(
    name = "print_numpy_version",
    srcs = ["print_numpy_version.py"],
    deps = [requirement("numpy")],
)

py_binary(
    name = "print_xgboost_version",
    srcs = ["print_xgboost_version.py"],
    deps = [requirement("xgboost")],
)

Now, if I deactivate my virtualenv and bazel run //:print_numpy_version, it does exactly what you would expect:

(toyvirtualenv) $ deactivate

$ bazel run //:print_numpy_version

INFO: Analysed target //:print_numpy_version (11 packages loaded).
INFO: Found 1 target...
Target //:print_numpy_version up-to-date:
  bazel-bin/print_numpy_version
INFO: Elapsed time: 1.301s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action

INFO: Running command line: bazel-bin/print_numpy_version
numpy version: 1.14.3

But, if I execute bazel run //:print_xgboost_version, it fails with an error:

$ bazel run //:print_xgboost_version

INFO: Analysed target //:print_xgboost_version (1 packages loaded).
INFO: Found 1 target...
Target //:print_xgboost_version up-to-date:
  bazel-bin/print_xgboost_version
INFO: Elapsed time: 0.231s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action

INFO: Running command line: bazel-bin/print_xgboost_version
Traceback (most recent call last):
  File "/private/var/tmp/_bazel_username/aa3d6add0ad8594663c5db9508eed16c/execroot/__main__/bazel-out/darwin-fastbuild/bin/print_xgboost_version.runfiles/__main__/print_xgboost_version.py", line 3, in <module>
    import xgboost
ImportError: No module named xgboost
ERROR: Non-zero return code '1' from command: Process exited with status 1

It seems that for the numpy case, the dependency is copied into the runfiles directory:

$ ls ./bazel-bin/print_numpy_version.runfiles/pypi__numpy_1_14_3

__init__.py     numpy           numpy-1.14.3.data   numpy-1.14.3.dist-info

(notice the numpy subdirectory in there)

But for the xgboost case, the dependency is not copied into the runfiles directory:

$ ls ./bazel-bin/print_xgboost_version.runfiles/pypi__xgboost_0_71/

__init__.py     xgboost-0.71.data   xgboost-0.71.dist-info

(where is the xgboost subdirectory?)

I must admit that i'm completely baffled by this behavior. Any guidance would be much appreciated!


Solution

  • It's a bug in rules_python, the core issue is this one:

    https://github.com/bazelbuild/rules_python/issues/92