Search code examples
postgresqldockerdockerfiledocker-volume

Postgres Dockerfile exploration - VOLUME statement usage


I am looking at sample dockerfile to see how VOLUME is used , I come across the following lines from - https://github.com/docker-library/postgres/blob/master/Dockerfile-alpine.template

ENV PGDATA /var/lib/postgresql/data
# this 777 will be replaced by 700 at runtime (allows semi-arbitrary "--user" values)
RUN mkdir -p "$PGDATA" && chown -R postgres:postgres "$PGDATA" && chmod 777 "$PGDATA"
VOLUME /var/lib/postgresql/data

What is the purpose of using a volume here , here is my understanding - please confirm

  1. Create directory pointed by $PGDATA in image file system.
  2. Map it with the VOLUME so that any content created later as part of populating the content thorough docker-entrypont.sh by exposing a predefined directory that could be used by the container.

What if the VOLUME instr is not defined ? It might more laborious for someone to figure out where to keep custom changes unless VOLUME is not defined


Solution

  • Volume is define here, so when you start a container ( out of this image ) a new anonymous volume is created.

    The volume will hold your sensible data in this regard, so this is all you need to "persist" during normal/soft docker image lifecycled.

    Usually when the maintainers of docker images are already aware where the data, which will be sensible to keep, is located ( like here ) there will decorate the folder using VOLUME in the Dockerfile. This will, as mentioned, create a anon-volume during runtime but also makes you aware ( using docker inspect or reading the Dockerfile ) where volumes for persistence are located.

    In production you usually will used a named volume / path mount in your docker-compose file mounted to this very folder

    docker-compose.yml as named volume

    volumes:
      mydbdata:/var/lib/postgresql/data
    

    docker-compose.yml as path

    volumes:
      ./local/path/data:/var/lib/postgresql/data
    

    There are actually cons in defining such VOLUME definitions in the Dockerfile, which i will not elaborate here, but the main reason is "lifetime".

    Having no VOLUME in the Dockerfile and running

    docker-compose up -d
    # do something, manipulate the data
    docker-compose down
    
    # all your data would be lost when starting again
    docker-compose up -d
    

    Would remove not only the running container, but all your DB data, which might not what you intended ( you just wanted to recreated the container ).

    With VOLUME in the Dockerfile, the anon-volume would be persisted even over docker-compose down