Search code examples
pythongoogle-cloud-dataflowpython-poetry

Unable to install poetry in dataflow


I am getting this error when running any poetry executables

Traceback (most recent call last):
  File "/root/.local/bin/poetry", line 5, in <module>
    from poetry.console.application import main
  File "/root/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/console/application.py", line 11, in <module>
    from cleo.application import Application as BaseApplication
ModuleNotFoundError: No module named 'cleo'

My container is built using this logic.

FROM gcr.io/dataflow-templates-base/python38-template-launcher-base:flex_templates_base_image_release_20230508_RC00
ARG DIR=/dataflow/template
ARG dataflow_file_path
ARG PROJECT_ID
# environment to pull the right containers
ARG ENV
ARG TOKEN
ENV COMPOSER_$ENV=1

# copying over necessary files
RUN mkdir -p ${DIR}
WORKDIR ${DIR}
COPY transform/dataflow/${dataflow_file_path}.py beam.py
COPY deploy/dataflow/poetry.lock .
COPY deploy/dataflow/pyproject.toml .

# env var in order to use custom lib, for more info, see:
# https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#set_required_dockerfile_environment_variables
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${DIR}/beam.py"
ENV FLEX_TEMPLATE_PYTHON_EXTRA_PACKAGES=""
ENV FLEX_TEMPLATE_PYTHON_PY_OPTIONS=""
ENV PIP_NO_DEPS=True

# install poetry
RUN curl -sSL https://install.python-poetry.org | python -
ENV PATH "/root/.local/bin/:${PATH}"
RUN poetry --version

I have tried uninstalling it and the suggestions from:

These aren't really applicable because they are poetry executables in a non-docker environment but not really sure what else to do . I have an Dataflow SDK that was build from apache/beam_python3.8_sdk:2.45.0 that has the same logic and it is working.

I disregarded the last RUN command and built the container, these are the outputs of some checks

❯ docker run --rm --entrypoint /bin/bash dataflow -c 'which poetry'
/root/.local/bin/poetry
❯ docker run --rm --entrypoint /bin/bash dataflow -c 'poetry'
Traceback (most recent call last):
  File "/root/.local/bin/poetry", line 5, in <module>
    from poetry.console.application import main
  File "/root/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/console/application.py", line 11, in <module>
    from cleo.application import Application as BaseApplication
ModuleNotFoundError: No module named 'cleo'
❯ docker run --rm --entrypoint /bin/bash dataflow -c 'which python'
/usr/local/bin/python

My assumption is that the poetry executables are importing in a virtualenv that the other libraries aren't installed to.

UPDATE:

I went down a rabbit hole and did this

RUN pip install --no-cache-dir poetry cleo rapidfuzz importlib_metadata zipp crashtest

And running poetry --version worked but poetry config virtualenvs.create false or any other command will throw this error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/poetry", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/poetry/console/application.py", line 409, in main
    exit_code: int = Application().run()
  File "/usr/local/lib/python3.8/site-packages/cleo/application.py", line 338, in run
    self.render_error(e, io)
  File "/usr/local/lib/python3.8/site-packages/poetry/console/application.py", line 180, in render_error
    self.set_solution_provider_repository(self._get_solution_provider_repository())
  File "/usr/local/lib/python3.8/site-packages/poetry/console/application.py", line 398, in _get_solution_provider_repository
    from poetry.mixology.solutions.providers.python_requirement_solution_provider import (  # noqa: E501
  File "/usr/local/lib/python3.8/site-packages/poetry/mixology/__init__.py", line 5, in <module>
    from poetry.mixology.version_solver import VersionSolver
  File "/usr/local/lib/python3.8/site-packages/poetry/mixology/version_solver.py", line 8, in <module>
    from poetry.core.packages.dependency import Dependency
ModuleNotFoundError: No module named 'poetry.core'

Solution

  • I found the issue. It appears to be the ordering of the Dockerfile when building a Dataflow Flex template

    This will work

    # THIS WILL BE MOVED
    RUN curl -sSL https://install.python-poetry.org | python3 -
    ENV PATH "/root/.local/bin/:${PATH}"
    
    # copying over necessary files
    RUN mkdir -p ${DIR}
    WORKDIR ${DIR}
    COPY transform/dataflow/${dataflow_file_path}.py beam.py
    COPY deploy/dataflow/poetry.lock .
    COPY deploy/dataflow/pyproject.toml .
    
    # env var in order to use custom lib, for more info, see:
    # https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#set_required_dockerfile_environment_variables
    ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${DIR}/beam.py"
    ENV FLEX_TEMPLATE_PYTHON_EXTRA_PACKAGES=""
    ENV FLEX_TEMPLATE_PYTHON_PY_OPTIONS=""
    ENV PIP_NO_DEPS=True
    

    But this will not

    # copying over necessary files
    RUN mkdir -p ${DIR}
    WORKDIR ${DIR}
    COPY transform/dataflow/${dataflow_file_path}.py beam.py
    COPY deploy/dataflow/poetry.lock .
    COPY deploy/dataflow/pyproject.toml .
    
    # env var in order to use custom lib, for more info, see:
    # https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#set_required_dockerfile_environment_variables
    ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${DIR}/beam.py"
    ENV FLEX_TEMPLATE_PYTHON_EXTRA_PACKAGES=""
    ENV FLEX_TEMPLATE_PYTHON_PY_OPTIONS=""
    ENV PIP_NO_DEPS=True
    
    
    # THIS MOVED
    RUN curl -sSL https://install.python-poetry.org | python3 -
    ENV PATH "/root/.local/bin/:${PATH}"
    

    Looking a bit deeper, it may be that the environment variables are at play, although I'm not quite exactly how. This is the environment variables of when there are errors

    This is the env of the working image

    ❯ docker run --rm --entrypoint /bin/bash dataflow -c 'env'
    _=/usr/bin/env
    PATH=/root/.local/bin/:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/gcloud/google-cloud-sdk/bin
    PYTHON_GET_PIP_URL=https://github.com/pypa/get-pip/raw/1a96dc5acd0303c4700e02655aefd3bc68c78958/public/get-pip.py
    PYTHON_GET_PIP_SHA256=d1d09b0f9e745610657a528689ba3ea44a73bd19c60f4c954271b790c71c2653
    LD_LIBRARY_PATH=/usr/local/lib
    PYTHON_PIP_VERSION=22.0.4
    SHLVL=0
    LANG=C.UTF-8
    HOME=/root
    PYTHON_SETUPTOOLS_VERSION=57.5.0
    PWD=/
    CLOUDSDK_CORE_DISABLE_PROMPTS=yes
    PYTHON_VERSION=3.8.16
    

    This is the env output of when it errors out

    ❯ docker run --rm --entrypoint /bin/bash dataflow -c 'env'
    _=/usr/bin/env
    PIP_NO_DEPS=True
    PATH=/root/.local/bin/:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/gcloud/google-cloud-sdk/bin
    PYTHON_GET_PIP_URL=https://github.com/pypa/get-pip/raw/1a96dc5acd0303c4700e02655aefd3bc68c78958/public/get-pip.py
    PYTHON_GET_PIP_SHA256=d1d09b0f9e745610657a528689ba3ea44a73bd19c60f4c954271b790c71c2653
    LD_LIBRARY_PATH=/usr/local/lib
    PYTHON_PIP_VERSION=22.0.4
    SHLVL=0
    FLEX_TEMPLATE_PYTHON_EXTRA_PACKAGES=
    FLEX_TEMPLATE_PYTHON_PY_OPTIONS=
    FLEX_TEMPLATE_PYTHON_PY_FILE=/dataflow/template/beam.py
    LANG=C.UTF-8
    HOME=/root
    PYTHON_SETUPTOOLS_VERSION=57.5.0
    PWD=/dataflow/template
    CLOUDSDK_CORE_DISABLE_PROMPTS=yes
    PYTHON_VERSION=3.8.16