Search code examples
pythondockerpipdockerfiledocker-multi-stage-build

Docker multi-stage ModuleNotFoundError


I'm building my first Docker multi-staged project (to reduce size of the image) following this tutorial: https://pythonspeed.com/articles/multi-stage-docker-python/.

My dockerfile is pretty simple:

RUN apt-get update

RUN apt-get -y --no-install-recommends install \
      python3 python3-pip python3-venv

RUN python3 -m venv /opt/fwr

ENV PATH="/opt/fwr/bin:$PATH"

COPY requirements.txt .

RUN pip install -r requirements.txt

FROM python:3-alpine3.18 AS build-image

WORKDIR /opt/fwr

COPY --from=compile-image /opt/fwr /opt/fwr

COPY *.py ./

ENV PATH="/opt/fwr/bin:$PATH"

CMD ["-u", "main.py"] 

ENTRYPOINT ["python"]

All stages are going well, but once I try tu ryn container I got:

Traceback (most recent call last):
  File "/opt/fwr/main.py", line 2, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas' 

The question is: What did I do wrong? Thanks!


Solution

  • It looks like you're using a different base image for compiling than you are in the final image. Your Dockerfile as shown isn't valid -- it's missing the initial FROM line -- but it looks like you're probably using an Ubuntu variant.

    Ubuntu -- and most other Linux distributions -- are built around the glibc C library. Alpine, in order to reduce the size of the distribution, uses musl libc. When you build something under Ubuntu, it is very common for it to fail to run under Alpine because the two environments use different dynamic loaders.

    If you use the same base image for compiling things that you use in your final image, you'll find that things build and run as expected:

    FROM python:3-alpine3.18 AS compile-image
    
    RUN apk add alpine-sdk gfortran
    RUN python3 -m venv /opt/fwr
    
    ENV PATH="/opt/fwr/bin:$PATH"
    
    COPY requirements.txt .
    
    RUN pip install -r requirements.txt
    
    FROM python:3-alpine3.18 AS build-image
    
    WORKDIR /opt/fwr
    
    COPY --from=compile-image /opt/fwr /opt/fwr
    
    COPY *.py ./
    
    ENV PATH="/opt/fwr/bin:$PATH"
    
    CMD ["-u", "main.py"] 
    
    ENTRYPOINT ["python"]
    

    NB: Pandas doesn't provide binary wheels for Alpine, so everything needs to be built from source. That can take a long time. Because optimizing for size is often a wasted effort, you can substantially improve your build time if you just use the standard Python image instead:

    FROM python:3
    
    RUN python3 -m venv /opt/fwr
    WORKDIR /opt/fwr
    
    ENV PATH="/opt/fwr/bin:$PATH"
    
    COPY requirements.txt .
    
    RUN pip install -r requirements.txt
    
    COPY *.py ./
    
    ENV PATH="/opt/fwr/bin:$PATH"
    
    CMD ["-u", "main.py"] 
    
    ENTRYPOINT ["python"]
    

    Since we're not actually compiling anything here, we can use a single stage image. Total build time is maybe a minute (probably faster if that alpine build wasn't still running in other terminal).

    Update

    The alpine build finally finished; the final image sizes are:

    pytest-alpine   341 MB
    pytest-debian   1.15 GB
    

    You might think, "wow, the Debian-based image is so much bigger!", but in practice, because you will often have many images built from the same base, the real size impact is minimal.