Search code examples
apache-zookeeperconsulaccumulo

Does Accumulo actually need all Zookeeper servers listed?


Accumulo's documentation clearly expects that all ZooKeeper servers should be listed in the setting instance.zookeeper.host inside both accumulo-site.xml and client.conf. Is a single load-balanced ZooKeeper hostname sufficient for either (or both) of these settings? Or are there reasons to prefer an explicit list of all ZooKeeper hosts? I expect them to be practically equivalent - not just in initially running but also in performance for a sizeable cluster.

The Accumulo documentation doesn't go into sufficient detail on what specifically it does with each of the ZooKeeper servers listed. If it simply does its own load balancing, then I should be fine to provide the single load-balanced hostname. If there's some other magic done, then I would appreciate any insight.

I'm attempting to integrate Accumulo and some related services with a Consul service mesh. Consul offers internal DNS resolution to configured services, e.g. zookeeper.service.mydatacenter.consul, which provides rudimentary load balancing by randomly ordering the returned IP addresses of all such hosts. There are at least a couple of options for configuring Accumulo with all of the ZooKeeper hosts, but they get into complications I'd rather avoid. E.g. using a static set of "generic" hostnames doesn't allow us to easily change the size of the Zookeeper Cluster, or using Consul Template to dynamically list the ZooKeeper nodes brings up the need to deal with a rolling Accumulo cluster restart (which has a whole host of complications). I'm also open to alternative suggestions.

Note: This question is not specifically about the Consul service mesh, as it's simply the mechanism I'm using for load balancing (or alternatively for listing the ZooKeeper servers). I'm most interested in the advantages or disadvantages of configuring Accumulo with a single load-balanced ZooKeeper hostname.


Solution

  • ZooKeeper servers operate as a coordinated group, where the group as a whole determines the value of a field at any given time, based on consensus among the servers. If you have a 5-node ZooKeeper instance running, all 5 server names are relevant. You should not simply treat them as 5 redundant 1-node instances. Accumulo, and other ZooKeeper clients, actually use all of the servers listed. More information at https://zookeeper.apache.org