Search code examples
dockergitlabgitlab-cigitlab-ci-runner

Gitlab CICD on-premise instance running out of storage when using docker runners and cache


On an on-premise installation I've configured docker runners to execute the CICD jobs.

After some days we've noticed that the disk fills up quickly in the following paths:

/lib/docker/volumes/_data
/var/lib/docker/overlay2

Is there an automated way to purge that info?


Solution

  • What takes up space

    Over time, the runner will accumulate various versions of images used in the image: key of all the jobs it runs as well as cache volumes if caching is enabled. At times, the runner may also fail to cleanup containers (which also prevent their images/volumes from being pruned). So, to prevent this accumulation of storage, you need to periodically cleanup containers, images, and volumes.

    How to clean it up

    You can perform the required cleanup with the following command:

    docker container prune -f && docker image prune -af && docker volume prune -f
    

    The prune won't interfere with any containers/images/volumes that are actively in use, so it should be OK to run at any time. You can set this command to run periodically using crontab or other automated means.

    You could also make this command a bit more robust by adding prune filters to what images are pruned. For example, you may wish to only prune older images the most recently created images remain in cache and improve performance -- e.g.,:

    ... docker image prune -af --filter "until=240h" ...
    

    Another solution is the gitlab-runner-docker-cleanup utility container, which lets you accomplish something similar, but with some convenient dynamic controls for when the cleanup is run (e.g., cleanup only when disk space is below a threshold). This is a container that is intended to keep running in the background forever on the runner host system:

    docker run -d \
        -e LOW_FREE_SPACE=10G \
        -e EXPECTED_FREE_SPACE=20G \
        -e LOW_FREE_FILES_COUNT=1048576 \
        -e EXPECTED_FREE_FILES_COUNT=2097152 \
        -e DEFAULT_TTL=10m \
        -e USE_DF=1 \
        --restart always \
        -v /var/run/docker.sock:/var/run/docker.sock \
        --name=gitlab-runner-docker-cleanup \
        quay.io/gitlab/gitlab-runner-docker-cleanup
    

    On local docker volume caching

    As a side note, I've also found that enabling caching is not particularly helpful when you have a distributed fleet of shared runners, in part because the docker volume cache (which is separate from the global cache or distributed cache) is local and won't have any effect when your builds land on different runners whose local caches are partitioned from one another.

    Although not necessary, if you have multiple shared runners, I would recommend setting the disable_cache setting to true in the [runners.docker] section of your runner configuration. In my experience, this significantly reduces the rate of storage accumulation; YMMV.