Search code examples
dockerhealth-monitoringconsul

Issues with running a consul docker health check


am running the progrium/consul container with the gliderlabs/registrator container. I am trying to create health checks to monitor if my docker containers are up or down. However I noticed some very strange activity with with health check I was able to make. Here is the command I used to create the health check:

curl -v -X PUT http://$CONSUL_IP_ADDR:8500/v1/agent/check/register -d @/home/myUserName/health.json

Here is my health.json file:

{
"id": "docker_stuff",
"name": "echo test",
"docker_container_id": "4fc5b1296c99",
"shell": "/bin/bash",
"script": "echo hello",
"interval": "2s"
}

First I noticed that this check would automatically delete the service whenever the container was stopped properly, but would do nothing when the container was stopped improperly (i.e. durring a node failure).

Second I noticed that the docker_container_id did not matter at all, this health check would attach itself to every container running on the consul node it was attached to.

I would like to just have a working tcp or http health test run for every docker container running on a consul node (yes I know my above json file runs a script, I just created that one following the documentation example). I just want consul to be able to tell if a container is stopped or running. I don't want my services to delete themselves when a health check fails. How would I do this.

Note: I find the consul documentation on Agent Health Checks very lacking, vague and inaccurate. So please don't just link to it and tell me to go read it. I am looking for a full explanation on exactly how to set up docker health checks the right way.

Update: Here is how to start consul servers with the most current version of the official consul container (right now its the dev versions, soon ill update it with the production versions):

#bootstrap server
docker run -d \
-p 8300:8300 \
-p 8301:8301 \
-p 8301:8301/udp \
-p 8302:8302 \
-p 8302:8302/udp \
-p 8400:8400 \
-p 8500:8500 \
-p 53:53/udp \
--name=dev-consul0 consul agent -dev -ui -client 0.0.0.0

#its IP address will then be the IP of the host machine
#lets say its 172.17.0.2

#start the other two consul servers, without web ui
docker run -d --name --name=dev-consul1 \
-p 8300:8300 \
-p 8301:8301 \
-p 8301:8301/udp \
-p 8302:8302 \
-p 8302:8302/udp \
-p 8400:8400 \
-p 8500:8500 \
-p 53:53/udp \
consul agent -dev -join=172.17.0.2

docker run -d --name --name=dev-consul2 \
-p 8300:8300 \
-p 8301:8301 \
-p 8301:8301/udp \
-p 8302:8302 \
-p 8302:8302/udp \
-p 8400:8400 \
-p 8500:8500 \
-p 53:53/udp \
consul agent -dev -join=172.17.0.2

# then heres your clients
docker run -d --net=host --name=client0 \
-e 'CONSUL_LOCAL_CONFIG={"leave_on_terminate": true}' \
consul agent -bind=$(hostname -i) -retry-join=172.17.0.2

https://hub.docker.com/r/library/consul/


Solution

  • So a solution that works around using any version of the consul containers is to just directly install consul on the host machine. This can be done by following these steps from https://sonnguyen.ws/install-consul-and-consul-template-in-ubuntu-14-04/:

    sudo apt-get update -y
    sudo apt-get install -y unzip curl
    sudo wget https://releases.hashicorp.com/consul/0.6.4/consul_0.6.4_linux_amd64.zip
    
    sudo unzip consul_0.6.4_linux_amd64.zip  
    sudo rm consul_0.6.4_linux_amd64.zip
    
    sudo chmod +x consul
    sudo mv consul /usr/bin/consul
    
    sudo mkdir -p /opt/consul
    cd /opt/consul
    sudo wget https://releases.hashicorp.com/consul/0.6.4/consul_0.6.4_web_ui.zip  
    sudo unzip consul_0.6.4_web_ui.zip 
    sudo rm consul_0.6.4_web_ui.zip
    
    sudo mkdir -p /etc/consul.d/
    
    sudo wget https://releases.hashicorp.com/consul-template/0.14.0/consul-template_0.14.0_linux_amd64.zip
    sudo unzip consul-template_0.14.0_linux_amd64.zip
    sudo rm consul-template_0.14.0_linux_amd64.zip
    sudo chmod a+x consul-template
    sudo mv consul-template /usr/bin/consul-template
    
    
    sudo nohup consul agent -server -bootstrap-expect 1 \
      -data-dir /tmp/consul -node=agent-one \
      -bind=$(hostname -i) \
      -client=0.0.0.0 \
      -config-dir /etc/consul.d \
      -ui-dir /opt/consul/ &
    
    echo 'Done with consul install!!!'
    

    Then after you do this create your consul health check json files, info on how to do that can be found here. After you create your json files just put them in the /etc/consul.d directory and restart consul with consul reload. If after the reload consul does not add your new health checks then there is something wrong with the syntax of your json files. Go back edit them and try again.