Search code examples
dockerdocker-swarm

What exactly does docker swarms dispatcher-heartbeat do?


I am managing a couple of stacks on a couple of nodes running in docker swarm. Unfortunately it seems a worker node lost connection to the single manager I currently have which set the node to a faulty state and closed the running services on that node.

After some searching I see that there is the option to set dispatcher-heartbeat in the swarm with this command.

docker swarm --dispatcher-heartbeat 5s

Apparently the default is 5 seconds and suggestions say that it is good to set to 30s instead. What exactly does this setting do? Is it possible to set this to a very high number instead? I just want the services to keep running on the node no matter what.


Solution

  • we had a similar issue but in combination with VMWare's VSphere (and additional services, which move VMs around the different hosts). At the time when the node was moved it was "down/seemed unreachable" for a few moments but in the swarm, it was already marked as down and services got rescheduled even tho the node came back up instantly.

    So to the question. Docker Swarm uses this parameter (dispatcher-heartbeat) to tell its nodes how often they have to report back their health status. So for instance default value is 5s, so if the node does not respond in a 5s interval it is marked as down/unreachable.

    I have found back then (and I will link the discussion here as well) a notable topic where a company tested the property with 30 60 and 90s on the dispatcher-heartbeat probably worth taking a look.

    https://github.com/moby/moby/issues/38321#issuecomment-450138988

    Hope it helps.