Search code examples
pythondockerfileconflicting-librariesdbt-bigquery

Resolving conflicts in python library dependency versions in apache/airflow docker image (due to dbt-bigquery library)


#15 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.

#15 google-cloud-aiplatform 1.16.1 requires google-cloud-bigquery<3.0.0dev,>=1.15.0, but you have google-cloud-bigquery 3.10.0 which is incompatible.

#15 google-ads 18.0.0 requires protobuf!=3.18.*,!=3.19.*,<=3.20.0,>=3.12.0, but you have protobuf 3.20.3 which is incompatible.

We are receiving these errors in the logs of docker-compose build when building our apache airflow image. According to LLM model:

  • The first conflict is between google-cloud-aiplatform and google-cloud-bigquery. The google-cloud-aiplatform library requires a version of google-cloud-bigquery that is less than 3.0.0dev and greater than or equal to 1.15.0, but you have google-cloud-bigquery version 3.10.0 installed which is incompatible.
  • The second conflict is between google-ads and protobuf. The google-ads library requires a version of protobuf that is less than or equal to 3.20.0 and greater than or equal to 3.12.0, excluding versions 3.18.* and 3.19.*, but you have protobuf version 3.20.3 installed which is incompatible.

It's worth noting that dbt-bigquery==1.5.0 is a new release from only a few weeks ago.

Here is our Dockerfile:

FROM --platform=linux/amd64 apache/airflow:2.5.3

# install mongodb-org-tools
USER root
RUN apt-get update && apt-get install -y gnupg software-properties-common && \
    curl -fsSL https://www.mongodb.org/static/pgp/server-4.2.asc | apt-key add - && \
    add-apt-repository 'deb https://repo.mongodb.org/apt/debian buster/mongodb-org/4.2 main' && \
    apt-get update && apt-get install -y mongodb-org-tools
USER airflow

ADD requirements.txt /requirements.txt
RUN pip install -r /requirements.txt

and our requirements.txt

gcsfs==0.6.1                        # Google Cloud Storage file system interface
ndjson==0.3.1                       # Newline delimited JSON parsing and serialization
pymongo==3.12.1                     # MongoDB driver for Python
dbt-bigquery==1.5.0                 # dbt adapter for Google BigQuery
numpy==1.21.1                       # Numerical computing in Python
pandas==1.3.1                       # Data manipulation and analysis library
billiard                            # Multiprocessing replacement, to avoid "daemonic processes are not allowed to have children" error using Pool

How can we resolve these dependency conflicts? How can we even tell which library dependencies are for which libraries in our requirements.txt? My assumption is that google-cloud-aiplatform and google-cloud-bigquery are both dependencies of dbt-bigquery, however if they were dependencies to the same library, I wouldn't except a dependency conflict.

Edit: some useful logs from the build:

Requirement already satisfied: protobuf>=3.18.3 in /home/airflow/.local/lib/python3.7/site-packages (from dbt-core~=1.5.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (3.20.0)

Collecting google-cloud-bigquery~=3.0
Downloading google_cloud_bigquery-3.10.0-py2.py3-none-any.whl (218 kB)

Requirement already satisfied: proto-plus<2.0.0dev,>=1.15.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (1.19.6)

Requirement already satisfied: grpcio<2.0dev,>=1.47.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (1.53.0)

Requirement already satisfied: google-resumable-media<3.0dev,>=0.6.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (2.4.1)

Requirement already satisfied: google-cloud-core<3.0.0dev,>=1.6.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (2.3.2)

Requirement already satisfied: google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (2.8.2)

Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.56.2 in /home/airflow/.local/lib/python3.7/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (1.56.4)

Requirement already satisfied: grpcio-status<2.0dev,>=1.33.2 in /home/airflow/.local/lib/python3.7/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (1.48.2)

Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-resumable-media<3.0dev,>=0.6.0->google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5))

google-cloud-aiplatform and google-ads do not appear a single time in the build logs other than in the error message.


Solution

  • The problem arises from conflicts with Python packages the OS requests to install and the dependency graph of your project's packages.

    The short answer is to use the same strategy as you often would with any Python project: venv

    Solution

    Below is a complete working Dockerfile:

    FROM --platform=linux/amd64 apache/airflow:2.5.3-python3.9
    
    # install mongodb-org-tools
    ENV DEBIAN_FRONTEND noninteractive
    USER root
    RUN apt-get update && apt-get install -y --no-install-recommends gnupg software-properties-common python3-venv && \
        curl -fsSL https://www.mongodb.org/static/pgp/server-4.2.asc | apt-key add - && \
        add-apt-repository 'deb https://repo.mongodb.org/apt/debian buster/mongodb-org/4.2 main' && \
        apt-get update && apt-get install -y --no-install-recommends mongodb-org-tools
    
    COPY requirements.txt /usr/local/app/requirements.txt
    
    ENV VIRTUAL_ENV=/usr/local/venv
    RUN python3 -m venv $VIRTUAL_ENV
    ENV PATH="$VIRTUAL_ENV/bin:$PATH"
    
    RUN \
        pip install --upgrade --no-cache-dir --no-user pip && \
        pip install --no-cache-dir --no-user -r /usr/local/app/requirements.txt
        # run your app
    

    Note the setup an use of venv here. Just like outside a container, this will partition your application dependencies from the system-installed one inside the container.

    Notes

    • In this sample I have used root user as the permissions issue was getting annoying. In your production file you'll want to use COPY chown... and put things in place with appropriate USER permissions.

    • /usr/local/app/ is just my paradigm. You can put the files anywhere.

    • Because we are rewriting the $PATH instead of using activate for the venv, you have to tell pip --no-user.

    • At first trying --no-install-recommends in apt-get install to see if the affected dependency would be excluded. However, I left it in there as it's good practice and minimize your image size.

    Detail

    When running apt-get install you can see a number of packages are installed:

    #6 4.003 The following NEW packages will be installed:
    #6 4.003   dbus dmsetup gir1.2-glib-2.0 gir1.2-packagekitglib-1.0 iso-codes
    #6 4.003   libapparmor1 libappstream4 libargon2-1 libcap2 libcap2-bin libcryptsetup12
    #6 4.003   libcurl3-gnutls libdbus-1-3 libdevmapper1.02.1 libdw1 libelf1
    #6 4.003   libgirepository-1.0-1 libglib2.0-0 libglib2.0-bin libglib2.0-data
    #6 4.003   libgstreamer1.0-0 libip4tc2 libkmod2 libnss-systemd libpackagekit-glib2-18
    #6 4.003   libpam-cap libpam-systemd libpolkit-agent-1-0 libpolkit-gobject-1-0
    #6 4.003   libstemmer0d libunwind8 libyaml-0-2 packagekit packagekit-tools policykit-1
    #6 4.003   python-apt-common python3-apt python3-dbus python3-distro-info python3-gi
    #6 4.003   python3-pycurl python3-software-properties shared-mime-info
    #6 4.003   software-properties-common systemd systemd-sysv systemd-timesyncd ucf
    #6 4.003   unattended-upgrades xdg-user-dirs xz-utils
    ...
    #7 6.657 The following NEW packages will be installed:
    #7 6.657   dbus dmsetup gir1.2-glib-2.0 gir1.2-packagekitglib-1.0 iso-codes
    #7 6.657   libapparmor1 libappstream4 libargon2-1 libcap2 libcap2-bin libcryptsetup12
    #7 6.657   libcurl3-gnutls libdbus-1-3 libdevmapper1.02.1 libdw1 libelf1
    #7 6.657   libgirepository-1.0-1 libglib2.0-0 libglib2.0-bin libglib2.0-data
    #7 6.657   libgstreamer1.0-0 libip4tc2 libkmod2 libnss-systemd libpackagekit-glib2-18
    #7 6.657   libpam-cap libpam-systemd libpolkit-agent-1-0 libpolkit-gobject-1-0
    #7 6.657   libstemmer0d libunwind8 libyaml-0-2 packagekit packagekit-tools policykit-1
    #7 6.657   python-apt-common python3-apt python3-dbus python3-distro-info python3-gi
    #7 6.657   python3-pycurl python3-software-properties shared-mime-info
    #7 6.657   software-properties-common systemd systemd-sysv systemd-timesyncd ucf
    #7 6.657   unattended-upgrades xdg-user-dirs xz-utils
    #7 6.658 The following packages will be upgraded:
    #7 6.658   libsystemd0
    

    I didn't track down the exact problem package, but you can see several python3-* packages requested to be installed. One of these conflicts with the dependency graph of your application.