Requirement: To be able to install the rpy2 library, as the code to be orchestrated with airflow uses it extensively
Current Dockerfile
FROM ubuntu:latest
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends build-essential r-base r-base-core r-cran-randomforest python3.6 python3-pip python3-setuptools python3-dev&& \
rm -r /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip3 install --upgrade pip==20.0.2 wheel==0.34.2 setuptools==49.6.0
RUN python3 -m pip install rpy2
RUN Rscript -e "install.packages('data.table')"
COPY . /app
Issue: I'm having issues surrounding the necessary libraries, which didn't come up in the code itself.
The Error:
[6/8] RUN python3 -m pip install rpy2:
1.176 Collecting rpy2
1.304 Downloading rpy2-3.5.14.tar.gz (219 kB)
1.422 Installing build dependencies: started
4.186 Installing build dependencies: finished with status 'done'
4.187 Getting requirements to build wheel: started
4.225 Getting requirements to build wheel: finished with status 'error'
4.225 ERROR: Command errored out with exit status 1:
4.225 command: /usr/bin/python3 /usr/local/lib/python3.10/dist-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpff4u1mul
4.225 cwd: /tmp/pip-install-12iwr626/rpy2
4.225 Complete output (31 lines):
4.225 Traceback (most recent call last):
4.225 File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pep517/_in_process.py", line 257, in <module>
4.225 main()
4.225 File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pep517/_in_process.py", line 240, in main
4.225 json_out['return_val'] = hook(**hook_input['kwargs'])
4.225 File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pep517/_in_process.py", line 85, in get_requires_for_build_wheel
4.225 backend = _build_backend()
4.225 File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pep517/_in_process.py", line 63, in _build_backend
4.225 obj = import_module(mod_path)
4.225 File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
4.225 return _bootstrap._gcd_import(name[level:], package, level)
4.225 File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
4.225 File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
4.225 File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
4.225 File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
4.225 File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
4.225 File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
4.225 File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
4.225 File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
4.225 File "<frozen importlib._bootstrap_external>", line 883, in exec_module
4.225 File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
4.225 File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 10, in <module>
4.225 import distutils.core
4.225 File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
4.225 File "<frozen importlib._bootstrap>", line 1002, in _find_and_load_unlocked
4.225 File "<frozen importlib._bootstrap>", line 945, in _find_spec
4.225 File "/usr/local/lib/python3.10/dist-packages/_distutils_hack/__init__.py", line 72, in find_spec
4.225 return self.get_distutils_spec()
4.225 File "/usr/local/lib/python3.10/dist-packages/_distutils_hack/__init__.py", line 77, in get_distutils_spec
4.225 class DistutilsLoader(importlib.util.abc.Loader):
4.225 AttributeError: module 'importlib.util' has no attribute 'abc'
All these errors tend to be issues with different package versions fighting each other. For instance: a package removed a method or moved some functions around in its latest release, and another package that depends on the former is not aware (yet) of those changes.
As in: Package A uses Package B's .do_something
method, but Package B's developers rename it to .do_something_better
. If you have the latest version of B, but an old version of A which is not yet aware of the rename... well... it will crash (as you've seen)
That seems to be what's happening with Python 3.10 and setuptools quite a bit.
TL;DR: you're seeing a quite common (and pesky) versioning issue.
This said, this Dockerfile is successfully building:
FROM ubuntu:latest
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends build-essential \
r-base r-base-core r-cran-randomforest \
libinput-dev libgbm-dev liblzma-dev libbz2-dev libicu-dev libblas-dev liblapack-dev \
python3.6 python3-pip python3-setuptools python3-dev&& \
rm -r /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip3 install --upgrade pip wheel setuptools>51
RUN python3 -m pip install rpy2
RUN Rscript -e "install.packages('data.table')"
COPY . /app
Notice there's a bunch of -dev
packages required and that I allowed pip, wheel and setuptools to be a bit more loose when it comes to versions. Also, since I don't have your requirements.txt
file, I had to left it blank.
HOWEVER: You are fetching the :latest
Ubuntu image. As of October 2023 that means installing Ubuntu 22.04 (codename "Jammy Jellyfish"). The default Python 3 in that image is intended to be 3.10 yet you seem to be installing Python 3.6. This can lead to potential issues, since if you do some apt-get install some_python_package
, you could potentially end up with Python 3.6 in your system, yet a version of some_python_package
intended for Python 3.10, which is not great.
If you'd rather use Python 3.6, may I suggest you base your Dockerfile on one of the Python Docker images?
For instance, python:3.6.14-bullseye
, which is Debian (not Ubuntu) based but contains some tweaks and environment variables geared towards providing a safe environment (or "ecosystem") for Python 3.6
FROM python:3.6.14-bullseye
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends build-essential \
r-base r-base-core r-cran-randomforest \
libinput-dev libgbm-dev liblzma-dev libbz2-dev libicu-dev libblas-dev liblapack-dev \
&& rm -r /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip3 install --upgrade pip wheel setuptools
RUN python3 -m pip install rpy2
RUN Rscript -e "install.packages('data.table')"
COPY . /app
There are quite a bit more Python Docker images with slightly different features and contents. You might wanna take a look at this article and see which one best fits your needs.
Pinning an image to a specific version, rather than to :latest
has also the advantage that if (for instance) the Ubuntu Docker image maintainers decide to update what "latest" means from the current 22.04 to, let's say 24.04, your won't be bitten by an unexpected full O.S. upgrade.