Search code examples
dockergitlabgitlab-cigitlab-ci-runnerdocker-in-docker

Docker-in-Docker GitLab CI ghost containers


In my organization we use GitLab CI, and our Pipeline contains a Job which runs in a Docker image.

The Job runs a script within the Docker image, and that script itself spins up a Docker container for testing purposes, making this a Docker-in-Docker situation. The GitLab Job definition is very similar to the following:

job:
  image: docker:20.10.7
  tags:
    - our-custom-gitlab-runner-services
  script:
    # print some info
    - docker ps
    - docker network ls
    # build and run
    - docker network create my-net
    - docker build -t my-image .
    - docker run --rm --detach --net my-net --name my-container my-image
    # test
    # (run some tests here)
    # cleanup
    - docker stop my-container
    - docker network rm my-net

Now this seemed to be working find for a while. I was experimenting and I tried removing the --rm flag and the cleanup steps, thinking that it didn't matter since the job is self-contained (oh how wrong I was).

Then all hell broke loose.

Upon rerunning the script, the docker ps and docker network ls commands show that my-container and my-net exist when they should not. Adding extra commands immediately before the build and run section to remove these works as a workaround, but docker ps and docker network ls repeatedly show these ghosts in subsequent runs. It's like after being created once they now exist forever.

The question is why does this occur? A correct answer to this post gives a clear explanation of the phenomenon.


Solution

  • This is, in part, because you are using host docker socket mounting as your docker-in-docker executor mechanism. That is to say, your jobs are connected directly to the docker daemon on the host machine.

    When you run docker commands in your job, you're communicating directly from your job container to the host's docker daemon. As a result, the GitLab runner has no way of knowing what containers or networks you have created or even knowing that any cleanup in necessary. It cannot distinguish networks/containers created by your job from containers that may have been created on the host by other runners or for any other reason. Thus, the GitLab Runner (sensibly) does not make any efforts to remove networks/containers it did not directly create.

    One approach that could prevent this which would make your docker-in-docker builds more hermetic would be to utilize the dind service architecture. Instead of mounting the docker socket directly into the job, you instead run an independent docker daemon as a service to the job. This way, your job has its own docker daemon and that daemon gets cleaned up by the runner (and when it does, so do all the networks/containers created by your job). You have to configure your GitLab runner and jobs accordingly. This is described along with setup instructions further in the GitLab documentation.

    Once you have the runner properly configured, job definitions might look like this:

    myjob:
      image: docker
      services:
        - docker:dind  # <-- your own independent daemon!
      variables:
        # tell the docker client in your job
        # to communicate with the service daemon
        # instead of /var/run/docker.sock
        DOCKER_HOST: tcp://docker:2375
        DOCKER_TLS_CERTDIR: ""
    
      script:
        - docker network ls 
        # won't ever include my-net because 
        # it is destroyed along with the docker:dind service 
        # when your job ends!
        - docker network create my-net
        - docker info
        # ...
    

    This is also a better setup for docker-in-docker with GitLab to make your build systems more secure and (help) prevent jobs from escaping their containers. This is particularly important when different projects utilize the same underlying hosts.