I need to have ghostscript and ImageMagick available to do some PDF editting and OCR. I have got to the point that I use a Dockerfile but it seems that gcloud app deploy
would start from the beginning each time. Is there a way to speed it up by having the packages installed once?
Here's my Dockerfile:
ROM gcr.io/google-appengine/python
LABEL python_version=python3.6
RUN virtualenv --no-download /env -p python3.6
# Set virtualenv environment variables. This is equivalent to running
# source /env/bin/activate
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH
ADD requirements.txt /app/
RUN pip install -r requirements.txt
ADD . /app/
RUN apt-get update
RUN apt-get install imagemagick -y
RUN apt-get install ghostscript
CMD exec gunicorn -b :$PORT main:app
Move those steps earlier in the Dockerfile.
Docker’s layer-caching feature means that it won’t rebuild a step where it’s already run that step from the exact same base image. However, as soon as you run a step that invalidates the cache, nothing after that will be cached. In particular the ADD .
step will invalidate the cache if if anything at all in your source tree changes.
Style-wise, I’d change two other things. First, for similar caching reasons, it’s important to run apt-get update
and apt-get install
in the same RUN step, since previously-cached URLs from the “update” could become invalid. Second, I wouldn’t bother trying to set up a Python virtual environment, since a Docker image already provides an isolated filesystem and Python installation.
That ultimately leaves you with:
FROM gcr.io/google-appengine/python
LABEL python_version=python3.6
RUN apt-get update \
&& apt-get install -y ghostscript imagemagick
COPY requirements.txt /app/
RUN pip install -r requirements.txt
COPY . /app/
EXPOSE 8000
CMD ["gunicorn", "-b", ":8000", "main:app"]