Search code examples
apache-kafkakafka-consumer-apikafka-producer-api

Topics, partitions and keys


I am looking for some clarification on the subject. In Kafka documentations I found the following:

Kafka only provides a total order over messages within a partition, not between different partitions in a topic. Per-partition ordering combined with the ability to partition data by key is sufficient for most applications. However, if you require a total order over messages this can be achieved with a topic that has only one partition, though this will mean only one consumer process per consumer group.

So here are my questions:

  1. Does it mean if i want to have more than 1 consumer (from the same group) reading from one topic I need to have more than 1 partition?

  2. Does it mean I need same amount of partitions as amount of consumers for the same group?

  3. How many consumers can read from one partition?

Also have some questions regarding relationship between keys and partitions with regard to API. I only looked at .net APIs (especially one from MS) but looks like the mimic Java API. I see when using a producer to send a message to a topic there is a key parameter. But when consumer reads from a topic there is a partition number.

  1. How are partitions numbered? Starting from 0 or 1?
  2. What exactly relationship between a key and partition? As I understand some function on key will determine a partition. is that correct?
  3. If I have 2 partitions in a topic and want some particular messages go to one partition and other messages go to another I should use a specific key for one specific partition, and the rest for another?
  4. What if I have 3 partitions and one type of messages to one particular partition and the rest to other 2?
  5. How in general I send messages to a particular partition in order to know for a consumer from where to read? Or I better off with multiple topics?

Thanks in advance.


Solution

  • Partitions increase parallelism of Kafka topic. Any number of consumers/producers can use the same partition. Its up to application layer to define the protocol. Kafka guarantees delivery. Regarding the API, you may want to look at Java docs as they may be more complete. Based on my experience:

    1. Partitions start from 0
    2. Keys may be used to send messages to the same partition. For example hash(key)%num_partition. The logic is pluggable to Producer. https://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/producer/Partitioner.html
    3. Yes. but be careful not to end up with some key that will result in the "dedicated" partition. For this, you may want to have dedicated topic. For example, control topic and data topic
    4. This seems to be the same question as 3.
    5. I believe consumers should not make assumptions of the data based on partition. The typical approach is to have consumer group that can read from multiple partitions of a topic. If you want to have dedicated channels, it is better (safer/maintainable) to use separate topics.