I am considering building out a Docker Swarm cluster. For the purpose of keeping things both simple and relatively fault-tolerant, I thought about simply running 3 nodes as managers.
What are the trade-offs when not using any dedicated worker nodes? Is there anything I should be aware of that might not be obvious?
I found this Github issue which asks a similar question, but the answer is a bit ambiguous to me. It mentions the performance may be worse. It also mentions that it will take longer to reach consensus. In practice, what functionality would be slower? And what does "take longer to reach consensus" actually affect?
TL;DR pros and cons of all managers as workers in Swarm:
Pros:
- Prod-quality HA with only 3 or 5 servers
- Simplicity of design/management
- Still secure by default (secrets are encrypted on disk, mutual TLS auth and network encryption on control plane)
- Any node can administrate the Swarm
Cons:
- Requires tighter management of resources to prevent manager starvation
- Lower secure posture, secrets/keys stored on apps servers
- Compromised node means the whole Swarm could easily be compromised
- Limited to odd number of servers, usually 3 or 5
Full Answers to Your Questions
What are the trade-offs when not using any dedicated worker nodes? Is there anything I should be aware of that might not be obvious?
There are no hard requirements for using worker-only nodes. If you're deploying a solution where you know what resources you need, and the number of services/tasks are usually the same, there's nothing wrong with a Swarm of just three managers doing all the work, as long as you have considered these three areas that are affected:
- Security. In a perfect world, your managers would not be internet accessible and would only be on a backend subnet, doing only manager work. The managers have all the authority for the Swarm, hold all the encrypted secrets, store the encrypted Raft log, and also (by default) store the encryption keys on disk. Workers only store secrets they need, (and only in memory) and have no authority to do any work in the Swarm other then what they've been told to do by the leader. If a worker gets compromised you haven't "lost the Swarm" necessarily. This separation of powers is not a hard requirement, and many environments accept this risk and just put the managers as the main servers that will publish services to the public. It's just a question of security/complexity vs. cost.
- Node count. The minimum number of managers for redundancy is 3, and 3 or 5 is what I recommend most of the time. More managers do not equal more capacity, as only one manager is the leader at any time, and the only one to do manager work. The resource capacity of the leader is what determines how much work it can do simultaneously. If your managers are also doing app work, and you need more resource capacity then 3 nodes could handle, then I'd recommend the 4th node and higher are just workers.
- Performance/scale. Ideally, your managers have all the resources they need to do things fast, like leader election, task scheduling, running and reacting to healthchecks, etc. Their resource utilization will grow the larger the number of total nodes, total services, and rate of new work they have to perform (service/network creation, task changes, node changes, healthchecks, etc.). If you have a small number of servers and small number of services/replicas, then you could likely have the managers also be workers as long as you're careful (use resource limits on services) to prevent your apps (especially databases) from starving the docker daemon of resources so bad that Swarm can't do its job. When you start having random leader changes or errors/failures, you would want "check the managers for available resources" on your short list of troubleshooting steps.
Other questions:
In practice, what functionality would be slower? And what does "take longer to reach consensus" actually affect?
More managers = longer for managers to elect a new leader when one goes down. While there is no leader, the Swarm is in a read-only state and new replica tasks cannot be launched and service updates won't happen. Any container that fails won't auto-recover because the Swarm managers can't do work. You're running apps, ingress routing mesh, etc. all still function. A large part of the performance of manager health and leader election is tied to network latency between all manager nodes, as much as it is the number of managers. This is why Docker generally advises that a single Swarms managers all be in the same region so they get a low-latency round trip between each other. There is no hardset rule here. If you test 200ms latency between managers and test failures and are fine with the results and speed of leader election, cool.
Background info: