Search code examples
jenkinsamazon-ec2dockerjenkins-pluginsec2-ami

First execution of Docker on a new EC2 Jenkins Slave does not work


I'm using the EC2 Plugin in Jenkins to spin up slave instances when we need them. Recently I've wanted to play around with Docker so I installed it on the AMI we use as a slave - but the first run on the slave never seems to work.

+ docker ps
time="2015-04-17T15:38:20Z" level="fatal" msg="Get http:///var/run/docker.sock/v1.16/containers/json: dial unix /var/run/docker.sock: no such file or directory. Are you trying to connect to a TLS-enabled daemon without TLS?" 

Any runs after this seem to work - why won't the slave work on the first job? I've tried using sudo, executing docker ps before docker build but nothing seems to fix the problem.


Solution

  • The problem is that Jenkins is just waiting for the slave to respond to an SSH connection, not that Docker is running.

    To prevent the slave from becoming "online" too quickly, put a check in the "Init Script" section in the EC2 Slave Plugin configuration section. Here's an example of the one I use against the base AMI.

    while [[ -z $(/sbin/service docker status | grep " is running...") &&  $sleep_counter -lt 300 ]]; do sleep 1; ((sleep_counter++)); echo "Waiting for docker $sleep_counter seconds - $(/sbin/service docker status)"; done
    

    Amazingly, it can take up to 60 seconds between the slave coming up and the Docker service starting, so I've set the timeout to be 5 minutes.