Search code examples
pythontkintericudocker-multi-stage-buildpolyglot

Polyglot Dependencies Causes Trouble During Multi Stage Containerization (Python) - Docker


On the containerization step for a project which is using polyglot, tensorflow etc. I got error on multistage containerization from the dependencies of polyglot.

I can do a single stage containerization and it works fine. Here is the docker file of the single stage build that works fine.

FROM python:3.9 AS python

# Set the working directory in the container for Python
WORKDIR /app_py

# Copy the Python requirements.txt
COPY requirements.txt .

# Install system-level dependencies needed by polyglot
RUN apt-get update \
    && apt-get install -y libicu-dev libcld2-dev

# Install Python dependencies
RUN pip install --timeout 100000 --no-cache-dir -r requirements.txt
RUN python -m nltk.downloader stopwords

# Copy the Python application to the container
COPY sentence_embedder.py \
  k_means_clustering.py \
  bow_and_tf_idf.py \
  compare_final_results.py \
  universal-sentence-encoder-multilingual_3 \
  stopwords \
  qna_tensors.xlsx \
  Chatbot_Questions_CF_Departments.xlsx \
  ./

# Explicitly copy the Universal Sentence Encoder files
COPY universal-sentence-encoder-multilingual_3 /app_py/universal-sentence-encoder-multilingual_3

# Make port 5000 available to the world outside this container
EXPOSE 5000

# Define command to start Python app
CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:5000", "sentence_embedder:app"]  

But when I try to do a multistage containerization, polyglot dependencies causes error.

Here is the docker file for multi-stage build.

# Step 2: Use Python as the second base image
FROM python:3.9 AS builder

# Set the working directory in the container for Python
WORKDIR /app_py

# Copy the requirements and Python application to the container
COPY requirements.txt \
    sentence_embedder.py \
    k_means_clustering.py \
    bow_and_tf_idf.py \
    compare_final_results.py \
    universal-sentence-encoder-multilingual_3 \
    stopwords \
    qna_tensors.xlsx \
    Chatbot_Questions_CF_Departments.xlsx \
    ./

# Install system-level dependencies needed by polyglot
RUN apt-get update \
    && apt-get install -y libicu-dev libcld2-dev \
    && pip install --timeout 100000 -r requirements.txt \
    && python -m nltk.downloader stopwords \
    && apt-get clean && rm -rf /var/lib/apt/lists/* 
    
# Set the environment variable for shared libraries
ENV LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
     
# Stage 2: Production stage with slim Python image
FROM python:3.9 AS production

# Install necessary system-level tools and dependencies
RUN apt-get update && \
    apt-get install -y wget gnupg2 libicu-dev libcld2-dev &&\
    apt-get clean && rm -rf /var/lib/apt/lists/*

# Set the working directory in the container for Python
WORKDIR /app_py

RUN echo "env var for front 2"

# Copy only necessary files and dependencies from the builder stage
COPY --from=builder /usr/lib/x86_64-linux-gnu/libtk8.6.so /usr/lib/x86_64-linux-gnu/
COPY --from=builder /usr/local/lib/python3.9/site-packages/ /usr/local/lib/python3.9/site-packages/
COPY --from=builder /usr/local/bin/gunicorn /usr/local/bin/gunicorn
COPY --from=builder /app_py /app_py


# Set the environment variable for shared libraries
ENV LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH

# Add gunicorn binary directory to the $PATH
ENV PATH="/usr/local/bin:${PATH}"

# Make port 5000 available to the world outside this container
EXPOSE 5000

# Define command to start Python app
CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:5000", "sentence_embedder:app"]

Error message are generally at two dependencies;

  File "/app_py/sentence_embedder.py", line 7, in <module>
    from bow_and_tf_idf import BoWandTFIDF
  File "/app_py/bow_and_tf_idf.py", line 4, in <module>
    from polyglot.text import Text
  File "/usr/local/lib/python3.9/site-packages/polyglot/text.py", line 9, in <module>
    from polyglot.detect import Detector, Language
  File "/usr/local/lib/python3.9/site-packages/polyglot/detect/__init__.py", line 1, in <module>
    from .base import Detector, Language
  File "/usr/local/lib/python3.9/site-packages/polyglot/detect/base.py", line 11, in <module>
    from icu import Locale
  File "/usr/local/lib/python3.9/site-packages/icu/__init__.py", line 3, in <module>
    import tkinter as tk
  File "/usr/local/lib/python3.9/tkinter/__init__.py", line 37, in <module>
    import _tkinter # If this fails your Python may not be configured for Tk
ImportError: libtcl8.6.so: cannot open shared object file: No such file or directory

OR

  File "/app_py/sentence_embedder.py", line 7, in <module>
    from bow_and_tf_idf import BoWandTFIDF
  File "/app_py/bow_and_tf_idf.py", line 4, in <module>
    from polyglot.text import Text
  File "/usr/local/lib/python3.9/site-packages/polyglot/text.py", line 9, in <module>
    from polyglot.detect import Detector, Language
  File "/usr/local/lib/python3.9/site-packages/polyglot/detect/__init__.py", line 1, in <module>
    from .base import Detector, Language
  File "/usr/local/lib/python3.9/site-packages/polyglot/detect/base.py", line 11, in <module>
    from icu import Locale
ImportError: cannot import name 'Locale' from 'icu' (/usr/local/lib/python3.9/site-packages/icu/__init__.py)

OR

      File "/app_py/sentence_embedder.py", line 7, in <module>
        from bow_and_tf_idf import BoWandTFIDF
      File "/app_py/bow_and_tf_idf.py", line 4, in <module>
        from polyglot.text import Text
      File "/usr/local/lib/python3.9/site-packages/polyglot/text.py", line 9, in <module>
        from polyglot.detect import Detector, Language
      File "/usr/local/lib/python3.9/site-packages/polyglot/detect/__init__.py", line 1, in <module>
        from .base import Detector, Language
      File "/usr/local/lib/python3.9/site-packages/polyglot/detect/base.py", line 11, in <module>
        from icu import Locale
      File "/usr/local/lib/python3.9/site-packages/icu/__init__.py", line 37, in <module>
        from ._icu_ import *
    ImportError: libicui18n.so.72: cannot open shared object file: No such file or directory
    [2024-01-27 20:47:22 +0000] [10] [ERROR] Exception in worker process

Notice that multi-stage docker file is result of multiple iterations. All the additionals COPY command at production stage was because of missing dependencies. Any help would be appreciated.

I try to build a light container of python application with multi stage build. I try single stage build which results around 3.2 GB container size. When I try to do a multi stage build it gives me error usually on polyglot dependencies. (Text from polyglot.text)


Solution

  • I've solved the problem and will share it in case it will be useful for someone. In my case tkinter and "import from icu" error was because the ICU was mistakenly installed for polyglot. We only need PyICU. ICU should be removed from requirements.txt. Hopefully this will solve this proble.

    Furthermore for errors for missing shared object files like libicui18n.so.72 should be copied from builder stage.

    COPY --from=builder /usr/lib/x86_64-linux-gnu/libicui18n.so.72 /usr/lib/x86_64-linux-gnu
    COPY --from=builder /usr/lib/x86_64-linux-gnu/libicuuc.so.72 /usr/lib/x86_64-linux-gnu
    COPY --from=builder /usr/lib/x86_64-linux-gnu/libicudata.so.72 /usr/lib/x86_64-linux-gnu
    

    I know there are other methods like using virtual environments etc. but this using this method was more convenient for me.