Search code examples
dockerhigh-availabilitydocker-swarm

Understanding docker swarm in terms of high availability


I am currently trying to understand what would be necessary to create a docker swarm to make some service highly available. I read through a lot of the docker swarm documentation, but if my understanding is correct, docker swarm will just execute a service on any host. What would happen if a host fails? Would the swarm manager restart the service(s) running on that host/node on another one? Is there any better explanation of this than in the original documentation found here?


Solution

  • Nothing more complex than that really. Like it says, Swarm (and kubernetes, and most other tooling in this space) is declarative, which means that you tell it the state that you want (i.e. 'I want 4 instances of redis') and Swarm will converge the system to that state. If you have 3 nodes, then it will schedule 1 redis on Node 1, 1 on Node 2, and 2 on Node 3. If Node 2 dies, then the system is now not 'compliant' with your declared state, and Swarm will schedule another redis on Node 1 or 3 (depending on strategy, etc...).

    Now this dynamism of container / task / instance scheduling brings another problem, discovery. Swarm deals with this by maintaining an internal DNS registry and by creating VIP (virtual IPs) for each service. Instead of having to address / keep track of each redis instance, I can instead point to a service alias and Swarm will automatically route traffic to where it needs to go.

    Of course there are also other considerations:

    • Can your service support multiple backend instances? Is it stateless? Sessions? Cache? Etc...
    • What is 'HA'? Multi-node? Multi-AZ? Multi-region? Etc...