Search code examples
pythondockerpsycopg2alpine-linuxlibpq

Light Alpline docker container with psycopg2


I have a few services that run python 3.7 with flask and only require a few extra libraries. One of them is psycopg2 to be able to connect to postgres.

In itself, installing psycopg2 in alpine is not a very difficult task but I had some problems finding documentation on the matter. I managed to get this dockerfile that runs OK. The biggest downside is that it is about 355MB and it's just too heavy.

This is my initial dockerfile before any optimization:

FROM python:3.7-alpine

ENV PATH /usr/local/bin:$PATH

ENV LANG C.UTF-8

RUN mkdir -p /usr/src/app

COPY requirements.txt /usr/src/app/

RUN apk update \
    && apk add postgresql-dev \
    && apk add --virtual temp1 gcc python3-dev musl-dev \
    && pip install --upgrade pip \
    && pip install psycopg2==2.8.4

RUN pip install -r /usr/src/app/requirements.txt

RUN apk del temp1

COPY . /usr/src/app

WORKDIR /usr/src/app

EXPOSE 6000

ENTRYPOINT ["python3"]

CMD ["-m", "server"]

And my requirements.txt

psycopg2 == 2.8.4
connexion == 1.1.15
python_dateutil == 2.6.0
loguru~=0.4.1
flask~=1.1.2
six~=1.14.0
Werkzeug==0.16.1
pymongo
PyYAML == 5.3
setuptools == 45.1.0
flask_testing == 0.7.1
mo-future>=3
pyparsing==2.3.1
mo_files
pycryptodomex
ldap3

Doing some testing, i found out that the steps that increase the most the size of the image are:

  • Installing psycopg2 and postgresql-dev: 220MB are used only by these two
  • Installing the requirements: up to 60MB
  • Upgrading pip: adds 15MB to the final image

Things I tried to do to reduce its size:

  • Install postgresql-dev as a build dependency and remove it from the image once psycopg2 is built. Removing postgresql-dev raises an error where the file libpq.so.5 is not found.
  • Removing the upgrade pip statement. It's not required to work but I'd like to keep it up to date

I'm going to try to answer to these questions:

  • First of all how to install psycopg2 without wasting so much space
  • Any best practices I should apply to my dockerfile, regarding both space reduction and security of the container

Solution

  • Reducing psycopg2 installation size

    The first thing I wanted to do is removing postgresql-dev from the container and still being able to use psycopg2. The only file that seems to be missing is libpq.so.5. This file is available in the alpine package libpq available here.

    This way we can build psycopg2 and still save practically all the space it used before.

    Improving the dockerfile's steps efficiency

    I tried to minimize the number of steps in the dockerfile so the final image is lighter. Adding the appropriate flags to pip and apk we can reduce the amount of space used for cache. Also, declaring a variable for grouping all the build dependencies keeps things cleaner.

    Also I defined a more carefully written .dockerignore to save even more space. Using tools like tree can help you find files in your container that aren't necessary.

    Adding basic security

    Based on this fine article, I was able to specify a user for my container that didn't have the ability to modify the container.

    Final version

    This is the dockerfile I ended up with. It went down from 355MB to 135MB which isn't exactly perfect, but is a lot better.

    FROM python:3.7-alpine
    
    ENV PATH /usr/local/bin:$PATH
    ENV LANG C.UTF-8
    ENV USER=prodUser UID=12345 GID=23456
    
    RUN mkdir -p /usr/src/app
    
    COPY requirements.txt /usr/src/app/
    
    RUN buildDeps='gcc python3-dev musl-dev postgresql-dev' \
        && apk update \
        && apk add --no-cache libpq \
        && apk add --virtual temp1 --no-cache $buildDeps \
        && pip install --no-cache-dir -r /usr/src/app/requirements.txt \
        && apk del temp1
    
    COPY . /usr/src/app
    
    WORKDIR /usr/src/app
    
    RUN addgroup --gid "$GID" "$USER" \
      && adduser \
      --disabled-password \
      --gecos "" \
      --ingroup "$USER" \
      --uid "$UID" \
      "$USER"
    USER $USER
    
    EXPOSE 6000
    
    ENTRYPOINT ["python3"]
    
    CMD ["-m", "server"]
    

    Next steps

    • As the previously mentioned article suggests, I'm gonna do some research on gunicorn and gnix for production purposes.
    • I'm going to do some testing on the recommended packages installed by the requirements.txt file and try to remove the ones I don't need.
    • I could try to reduce even more the number of steps defined in the dockerfile

    Final notes

    I'm still new at working with docker so any advice or changes you suggest are welcomed!