Search code examples
apache-kafkaapache-zookeeperdistributed-systemconsistent-hashing

Zookeeper-Kafka and Consistent hashing


I am learning Zookeeper and I was stuck in middle with some confusion. I gone through various forums and questions and none clear my confusion and came to SO finally to get some clarification on the following things.

  1. As I understand Zookeeper works in master-worker architecture. So how Kafka fits in this architecture? Does each Kafka broker in the Kafka cluster acts as a client to zookeeper server ensemble or the user applications that produces and consumes messages act as clients to Zookeeper ensemble?

  2. For a particular topic/partition one Kafka broker would get engaged and if its get tons of messages (that it cannot handle), is it possible to distribute the work load using consistent hashing and how Zookeeper architecture support this?

Update: Does Zookeeper is something like variant of Gossip protocol used in DynamoDB for membership and failure detetion


Solution

  • I recommend going over the Zookeeper documentation (specially the Overview section) to clarify its main concepts and how it works.

    1. Kafka brokers act as Zookeeper clients. They connect to Zookeeper to read and write data about the state of the Kafka cluster.

      You may be confused by Zookeeper being a leader/follower system. Within the Zookeeper ensemble one of the Zookeeper servers acts as the leader and effectively handles requests. The followers forward requests to the leader.

    2. Kafka messages are not written to Zookeeper. Zookeeper only stores the topic/partitions metadata (topic configurations, replica and ISR list). Kafka brokers store the messages on their disks. Kafka producers decide the partition (hence the broker) when sending a message. The default partitioner can use round-robin to spread messages across brokers.

    Zookeeper uses its own consensus algorithm (Zab). You can find a description of it in the Zookeeper Wiki.