Search code examples
apache-kafkakafka-producer-api

What is the difference between Kafka partitions and Kafka replicas?


I created 3 Kafka brokers setup with broker id's 20,21,22. Then I created this topic:

bin/kafka-topics.sh --zookeeper localhost:2181 \
  --create --topic zeta --partitions 4 --replication-factor 3

which resulted in:

enter image description here

When a producer sends message "hello world" to topic zeta, to which partition the message first gets written to by Kafka?

The "hello world" message gets replicated in all 4 partitions?

Each broker among the 3 brokers contain all the 4 partitions? How is that related to replica factor of 3 in above context?

If I have 8 consumers running in their own processes or threads in parallel subscribed to zeta topic, how partitions or brokers are assigned by Kafka to serve these in parallel?


Solution

  • Replication and Partitions are two different things.

    Replication will copy identical data across the cluster for higher availability/durability. Partitions are Kafka's way to distribute non-redundant data across the cluster and it scales with the number of partitions.

    When a producer sends message "hello world" to topic zeta, to which partition the message first gets written to by Kafka?

    When you send a "hello world" message to a topic, by default, your producer applies a hashing algorithm based on the key of that message (like hash(key) % number_of_partitions). In case you did not provide a key the producer will do round-robin and it is therefore not predictable to which partitions the message will be sent. I am guessing if it is the first message, it will end up in partition 0.

    The "hello world" message gets replicated in all 4 partitions?

    This one message will get replicated across all your Replicas but not to the 4 partitions.

    You will find the message on the broker 20, 21, 22. However, each partition has a leader which is responsible for all reads and writes from and to that partition. In your screenshot you can also spot the broker id of the leader of each partition. From Leader: 21 for partition 0 you can tell that the leader of that partition sits on broker 21.

    Each broker among the 3 brokers contain all the 4 partitions? How is that related to replica factor of 3 in above context?

    As you have set the replication factor to 3 while having in total 3 brokers in your cluster all three brokers contain all four partitions. Again, there is a difference between partitions and replicas. You could have a Kafka "cluster" with a single broker and still have, say, 20 partitions in the topic.

    If I have 8 consumers running in their own processes or threads in parallel subscribed to zeta topic, how partitions or brokers are assigned by Kafka to serve these in parallel?

    Here it depends if those 8 consumers belong to the same Consumer Group or not. It is important to know that one partition can be read at most by one consumer thread from a particular consumer group.

    If all 8 consumers belong to the same group, 4 of them will read from one partition (only from the partition leader) and the other four will be idle.