Search code examples
apache-kafkakafka-consumer-apikafka-producer-api

How to change partitioner logic in a live system


In a Kafka deployment a custom topic partitioner logic is used to route all messages that belong to the same root entity (for example all message for particular user) to the same partition.

Can anyone recommend a strategy on how to deal with partitioning logic change in such live system?

One example that affects the partitioning is the obvious change of the partitioner implementation. The other example would be change of the number of partitions for a given topic.

In both cases, we would end up in a situation where some of the messages for user A, that entered the Kafka before the change, will be in partition 1, while after the change in partitioning logic or number of partitions messages for that same user A will go the partition 2.

This can lead to a problem where messages for user A are processed out of order. Consumer reading the messages from partition 2 could process messages before the consumer that reads the messages from partition 1.

Have anyone faced this issue in live system? How did you or would you solve this issue?

This seems like a very common scenario, but I was not able to find anything about it.

Thanks


Solution

  • The best way to change how records are partitioned is to use the default Apache Kafka® partitioner, and change the record keys. If all records from a user need to go to the same topic then make sure they all have the same key.

    If you'd like to change the keys for a whole set you can use KSQL to re-key (republish to a new topic with new keys) the data using the PARTITION BY function.