Search code examples
pythondockerpython-poetrypyarrow

Poetry failing to install Datasets and Transformers in Docker


I am trying to install Datasets and Transformers libraries to my project which is using Poetry environment. Everything works fine if I try it locally but if I try it run in docker it starts failing. This is my error:

19.22   ChefBuildError
19.22
19.22   Backend subprocess exited when trying to invoke build_wheel
19.22
19.22   <string>:34: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
19.22   WARNING setuptools_scm.pyproject_reading toml section missing 'pyproject.toml does not contain a tool.setuptools_scm section'
19.22   Traceback (most recent call last):
19.22     File "/tmp/tmppn9ozs5f/.venv/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py", line 36, in read_pyproject
19.22       section = defn.get("tool", {})[tool_name]
19.22                 ~~~~^^^^^^^^^^^
19.22   KeyError: 'setuptools_scm'
19.22   error: command 'cmake' failed: No such file or directory
19.22
19.22
19.22   at /usr/local/venv/lib/python3.11/site-packages/poetry/installation/chef.py:164 in _prepare
19.24       160│
19.24       161│                 error = ChefBuildError("\n\n".join(message_parts))
19.24       162│
19.24       163│             if error is not None:
19.24     → 164│                 raise error from None
19.24       165│
19.24       166│             return path
19.24       167│
19.24       168│     def _prepare_sdist(self, archive: Path, destination: Path | None = None) -> Path:
19.24
19.24 Note: This error originates from the build backend, and is likely not a problem with poetry but with pyarrow (16.1.0) not supporting PEP 517 builds. You can verify this by running 'pip wheel --no-cache-dir --use-pep517 
"pyarrow (==16.1.0)"'.
19.24
------
failed to solve: process "/bin/sh -c poetry install" did not complete successfully: exit code: 1

This is my docker file:

FROM python:3.11.8-alpine3.19

ARG PRODUCTION_ENV

ENV PYTHONFAULTHANDLER=1 \
    PYTHONUNBUFFERED=1 \
    PYTHONHASHSEED=random \
    PIP_NO_CACHE_DIR=off \
    PIP_DISABLE_PIP_VERSION_CHECK=on \
    PIP_DEFAULT_TIMEOUT=100 \
    POETRY_NO_INTERACTION=1 \
    POETRY_VIRTUALENVS_CREATE=false \
    POETRY_CACHE_DIR='/var/cache/pypoetry' \
    POETRY_HOME='/usr/local' \
    POETRY_VERSION=1.8.2

RUN apk --no-cache add curl
RUN curl -sSL https://install.python-poetry.org | python3 -

WORKDIR /app

COPY . /app

RUN poetry install 

I have tried to run pip wheel --no-cache-dir --use-pep517 "pyarrow (==16.1.0)" as recommended but it just resolved to another error. Also tried to install cmake but didn't change a thing.

Error when trying to install pyarrow:

Building wheel for pyarrow (pyproject.toml): started
13.14   Building wheel for pyarrow (pyproject.toml): finished with status 'error'
13.18   error: subprocess-exited-with-error
13.18
13.18   × Building wheel for pyarrow (pyproject.toml) did not run successfully.
13.18   │ exit code: 1
13.18   ╰─> [299 lines of output]
13.18       <string>:34: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
13.18       WARNING setuptools_scm.pyproject_reading toml section missing 'pyproject.toml does not contain a tool.setuptools_scm section'
13.18       Traceback (most recent call last):
13.18         File "/tmp/pip-build-env-9hzju8ak/overlay/lib/python3.10/site-packages/setuptools_scm/_integration/pyproject_reading.py", line 36, in read_pyproject
13.18           section = defn.get("tool", {})[tool_name]
13.18       KeyError: 'setuptools_scm'

Solution

  • Explanation

    Like David Maze mentioned, Alpine-based python image cannot use the standard PyPi-wheels. This is because alpine does not use glibc, which is required for the wheels.

    This means that it will try to build the packages from source and be both slower and heavier in image size. See this blog post for more information.

    You can confirm the problem just by running pip install pyarrow in the image.

    Failed to build pyarrow
    ERROR: Could not build wheels for pyarrow, which is required to install pyproject.toml-based projects
    

    Solution

    In short, you should consider switching your base image to just python:3.11.8, which can use the pyarrow-wheel out of the box. That's the easiest solution and also - in my opinion - the best, considering the performance implications.

    If you really really need to use alpine, see this question: How to install pyarrow on an Alpine Docker image?