Search code examples
linuxdockerjenkinsdocker-composejenkins-pipeline

Jenkins run sh in remote docker agent stuck


Problem

I configured a remote docker instance (on Server A) which enables tcp://server_a:2376 to serve the API.

And I have a Jenkins server deployed on Server B, using (Docker jenkinsci/blueocean image).

Now I can access the Docker instance on Server A through TCP port:

DOCKER_HOST=tcp://<server_a>:2376 docker ps
DOCKER_HOST=tcp://<server_a>:2376 docker exec some_container "ls"

The above operations are fine.

But when I make a Pipeline Script which runs through the Server-A-Docker as an agent, the problem comes out that the sh command stucks, with telling:

process apparently never started in /var/jenkins_home/workspace/agent-demo@tmp/durable-1ddcfc03

(running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)


Pipeline Script

node {
    docker.withServer('tcp://<server_a>:2376') {
        docker.image('python:latest').inside() {
            sh "python --version"
        }
    }
}

Pipeline Console Output

enter image description here

Started by user iotsofttest
Running in Durability level: MAX_SURVIVABILITY
[Pipeline] Start of Pipeline
[Pipeline] node
Running on Jenkins in /var/jenkins_home/workspace/agent-demo
[Pipeline] {
[Pipeline] withDockerServer
[Pipeline] {
[Pipeline] isUnix
[Pipeline] sh
+ docker inspect -f . python:latest
.
[Pipeline] withDockerContainer
Jenkins seems to be running inside container 5be8fc34c80a55ddcc2f5399009b97260adfc7ba9ef88985e0f7df614c707b42
but /var/jenkins_home/workspace/agent-demo could not be found among []
but /var/jenkins_home/workspace/agent-demo@tmp could not be found among []
$ docker run -t -d -u 0:0 -w /var/jenkins_home/workspace/agent-demo -v /var/jenkins_home/workspace/agent-demo:/var/jenkins_home/workspace/agent-demo:rw,z -v /var/jenkins_home/workspace/agent-demo@tmp:/var/jenkins_home/workspace/agent-demo@tmp:rw,z -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** python:latest cat
$ docker top 25dccce629d42d82b177c79544cdcd2675bad8daf94f11c55f7f9821eb6e052e -eo pid,comm
[Pipeline] {
[Pipeline] sh
process apparently never started in /var/jenkins_home/workspace/agent-demo@tmp/durable-1ddcfc03
(running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
[Pipeline] }
$ docker stop --time=1 25dccce629d42d82b177c79544cdcd2675bad8daf94f11c55f7f9821eb6e052e
$ docker rm -f 25dccce629d42d82b177c79544cdcd2675bad8daf94f11c55f7f9821eb6e052e
[Pipeline] // withDockerContainer
[Pipeline] }
[Pipeline] // withDockerServer
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
ERROR: script returned exit code -2
Finished: FAILURE

I've been wasting days on the problem, any ideas?


Solution

  • !!! Finally !!!

    After several days of several attempts, I found the key point which causes the problem.


    Reproduction of the problem environment

    • Server A: Jenkins master node deployed under docker-compose using the image jenkinsci/blueocean
    • Server B: Jenkins agent node(JNLP) deployed under docker-compose using image jenkins/inbound-agent:alpine

    enter image description here

    Inside both the Jenkins master/agent containers, I mounted the /var/run/docker.sock into them, so that both the jenkins nodes could access the docker in the container, and agent docker should be supported in our pipeline.

    So now I make the pipeline script as below:

    pipeline {
        agent {
            docker {
                image 'python:latest'
                label 'agent-hkgw'
            }
        }
        stages {
            stage('main') {
                steps {
                    sh '''python --version'''
                }
                
            }
        }
    }
    

    Well, when we built it, Stucked:

    enter image description here

    That's the issue that I mentioned in my question.


    SOLUTION

    After many ways of failed attempts, I notice that the mounted volume from Jenkins agent node to the Pipeline built container did not been explicitly declared in the Jenkins agent node. So I tried to mount the /var/jenkins folder from Server B to the Jenkins Agent Container:

    enter image description here

    Then the pipeline built works fine like a miracle!

    enter image description here

    Hope this helps those who met the same problem in the future, thanks for your answers who attempted to offer help.