Search code examples
apache-kafkadata-partitioning

How Kafka Handles Keyed Message Related to Partition


Can anyone explain:

  1. How actually Kafka store keyed message? Does a partition only assigned to a key? I mean, is it possible that a partition stores messages with multiple keys?
  2. If first question answer is yes, then how if the number of key is more than partition available?

My use case is, I am considering to send lot of ship data to brokers and store it by ship_id (MMSI, if you know) as key. The problem is, I dont know how many ship will be received then. So I can't define partition number in advance.


Solution

  • is it possible that a partition stores messages with multiple keys?

    Yes, the murmur2 hash (algorithm used by Kafka), mod the number of partitions in a topic can result in the same number. For example, if you have only one partition, any key obviously goes to the same partition

    how if the number of key is more than partition available?

    The hash is modulo'd, so it always is assigned a valid partition

    Now, if you have a well defined key, you are guaranteed ordering of messages into partitions, so the answer to the number of partitions really comes down to how much throughput a single partition can handle, and there is no short answer - how much data are you sending and how fast can one consumer get that data from one partition at "peak" consumption? Do appropriate performance tests, then scale the partition number up over new topics to handle potential future load

    You'll also need to consider "hot" / "cold" data. If you have 10 partitions for example that mapped to the first digit of the ID, then all your data started with even numbers, you'd end up with half of the partitions being empty