Search code examples
dockerdocker-volumedata-containers

docker volume container strategy


Let's say you are trying to dockerise a database (couchdb for example). Then there are at least two assets you consider volumes for:

  • database files
  • log files

Let's further say you want to keep the db-files private but want to expose the log-files for later processing.

As far as I undestand the documentation, you have two options:

  1. First option

    • define managed volumes for both, log- and db-files within the db-image
    • import these in a second container (you will get both) and work with the logs
  2. Second option

    • create data container with a managed volume for the logs
    • create the db-image with a managed volume for the db-files only
    • import logs-volume from data container when running db-image

Two questions:

  1. Are both options realy valid/ possible?
  2. What is the better way to do it?

br volker


Solution

  • The answer to question 1 is that, yes both are valid and possible.

    My answer to question 2 is that I would consider a different approach entirely and which one to choose depends on whether or not this is a mission critical system and that data loss must be avoided.

    Mission critical

    If you absolutely cannot lose your data, then I would recommend that you bind mount a reliable disk into your database container. Bind mounting is essentially mounting a part of the Docker Host filesystem into the container.

    So taking the database files as an example, you could image these steps:

    1. Create a reliable disk e.g. NFS that is backed-up on a regular basis
    2. Attach this disk to your Docker host
    3. Bind mount this disk into my database container which then writes database files to this disk.

    So following the above example, lets say I have created a reliable disk that is shared over NFS and mounted on my Docker Host at /reliable/disk. To use that with my database I would run the following Docker command:

    docker run -d -v /reliable/disk:/data/db my-database-image

    This way I know that the database files are written to reliable storage. Even if I lose my Docker Host, I will still have the database files and can easily recover by running my database container on another host that can access the NFS share.

    You can do exactly the same thing for the database logs:

    docker run -d -v /reliable/disk/data/db:/data/db -v /reliable/disk/logs/db:/logs/db my-database-image

    Additionally you can easily bind mount these volumes into other containers for separate tasks. You may want to consider bind mounting them as read-only into other containers to protect your data:

    docker run -d -v /reliable/disk/logs/db:/logs/db:ro my-log-processor

    This would be my recommended approach if this is a mission critical system.

    Not mission critical

    If the system is not mission critical and you can tolerate a higher potential for data loss, then I would look at Docker Volume API which is used precisely for what you want to do: managing and creating volumes for data that should live beyond the lifecycle of a container.

    The nice thing about the docker volume command is that it lets you created named volumes and if you name them well it can be quite obvious to people what they are used for:

    docker volume create db-data docker volume create db-logs

    You can then mount these volumes into your container from the command line:

    docker run -d -v db-data:/db/data -v db-logs:/logs/db my-database-image

    These volumes will survive beyond the lifecycle of your container and are stored on the filesystem if your Docker host. You can use:

    docker volume inspect db-data

    To find out where the data is being stored and back-up that location if you want to.

    You may also want to look at something like Docker Compose which will allow you to declare all of this in one file and then create your entire environment through a single command.