Search code examples
apache-kafkaapache-kafka-streams

Co-partitioning on Kafka without Kafka-Streams


I'm curious if kafka co-partitions within a consumer group by default or if that is functionality added by kafka-streams.

For example: Assume I have a consumer group group-1 and that group consumes topics A and B both of which have 10 partitions. If I then produce records to both A and B with the same key ID-1, am I guaranteed that the partitions of A and B containing the entries with ID-1 will be located on the same server instance?

Thank you!


Solution

  • Co-partitioning relates to how data is distributed across partitions; Kafka doesn't manage keys directly, it only knows about partitions. The assumption is that the same key will always go to the same partition (this depends on the partitioner.class used in your Kafka Producer, the default does exactly this).

    On the consumer side, co-partitioning is influenced by partition.assignment.strategy. The default setting is range. When consuming topics with an equal number of partitions and consistent key partitioning, each consumer group member will process the same partition numbers (all the p0, all the p1, etc.), making co-partitioning effective.

    Using round-robin as partition.assignment.strategy would make co-partitioning ineffective for instance. This can be used when multiple topics with different numbers of partitions are being consumed, and you want to distribute partition consumption evenly across consumer instances (where range would create unequal partition distribution among consumers). This is why Kafka Streams can repartition topics to ensure the same number of partitions to process data with co-partitioning.