Search code examples
dockerjenkinsjenkins-pipelinejenkins-declarative-pipelinedocker-in-docker

Jenkins declarative pipeline problem when running docker-in-docker


I just encountered a problem when running a Jenkins declarative pipeline on a Jenkins server that is itself running inside Docker, having access to the docker.sock from the host.

The structure of the pipeline is rather simple:

pipeline {
    agent {
        docker { image 'gradle:jdk11' }
    }
    stages {
        stage('Checkout') {
            steps {
                // ...
            }
        }
        stage('Assemble public API documentation') {
            environment {
                // ...
            }
            steps {
                // ...
            }
        }
        stage('Generate documentation') {
            steps {
                // ...
            }
        }
        stage('Upload documentation to Firebase') {
            agent {
                docker {
                    image 'node:12'
                    reuseNode false
                }
            }
            steps {
                // ...
            }
        }
    }
}

The idea is to run three stages in the first container, and then create a new container for the final stage. The following is printed when entering the last stage:

[Pipeline] stage
[Pipeline] { (Upload documentation to Firebase)
[Pipeline] getContext
[Pipeline] isUnix
[Pipeline] sh
+ docker inspect -f . node:12
/var/jenkins_home/workspace/publish_public_api_doc@tmp/durable-bc4d65d1/script.sh: 1: /var/jenkins_home/workspace/publish_public_api_doc@tmp/durable-bc4d65d1/script.sh: docker: not found
[Pipeline] isUnix
[Pipeline] sh
+ docker pull node:12
/var/jenkins_home/workspace/publish_public_api_doc@tmp/durable-297d223a/script.sh: 1: /var/jenkins_home/workspace/publish_public_api_doc@tmp/durable-297d223a/script.sh: docker: not found
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
$ docker stop --time=1 367647f97c9eed52bf85c13c2bc2203bb7194adac803d37cab0e0d0435325efa
$ docker rm -f 367647f97c9eed52bf85c13c2bc2203bb7194adac803d37cab0e0d0435325efa
[Pipeline] // withDockerContainer
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
ERROR: script returned exit code 127
Finished: FAILURE

I don't really understand what is happening here. In order to debug this, I logged in to that machine, and ran the docker command from the host, as well as from inside the running Jenkins container, and it was working. The way this is set up is that the Docker client is installed in the image, i.e. the binary itself is not shared into the container. Since the docker command is "not found", the only explanation that I have is that the docker command to start the agent for the final stage is not executed in the "top-level" Jenkins container, but in the JDK one, which does not have the docker executable inside. This, however, would seem unexpected, if not a bug. I'd be thankful if anyone was shedding some light on this.


Solution

  • Jenkins pipeline agents/nodes
    Your pipeline has specified an agent to run on at the top-most level. The pipeline will execute all commands on that agent (or within a docker container in your scenario), until another agent is specified. When a new agent is specified, the top-level agent will connect to it via some protocol and the new agent will execute all pipeline stages/steps that are within this agents scope. Once out of scope, the connection to the new agent will be closed and the top-level agent will once again execute all commands.

    What's causing the error?
    The forth stage attempts to change the execution context to a new agent. The current agent, the gradle:jdk11 container, will execute the steps to connect to this new agent. As the new agent is a docker container, the gradle:jdk11 container will attempt to use the docker command itself to spin up the new container.
    As you suspected there is no docker binary/service within this container.

    Why is this the expected behaviour?
    Assume that the top level agent is a different physical machine connected via tcp or ssh, rather than a docker container. This machine would need to have all the tools installed on it for compiling, generating docs, running unit tests, etc. E.g. it wouldn't use the doxygen binary installed on the Jenkins master as it should provide this itself (throwing errors if doxygen doesn't exist in the $PATH). Likewise, this machine would need docker installer to spin up the container in the forth stage.

    How can I get my pipeline working?

    • You could create your own custom docker image inheriting from gradle:jdk11 and share the host systems' docker. This would allow your custom image to spin up the docker image required in the forth stage. You would use agent { docker { image 'my-custom-img' } } at a global scope.

    • Alternatively you could use the master agent (or other physical machines) at a global scope and have each stage spin up its own container. Each stage would have a clean working environment, so you'd need to use stash/unstash or a mounted volume to share src/docs between stages.