Can anyone explain:
My use case is, I am considering to send lot of ship data to brokers and store it by ship_id
(MMSI, if you know) as key. The problem is, I dont know how many ship will be received then. So I can't define partition number in advance.
is it possible that a partition stores messages with multiple keys?
Yes, the murmur2 hash (algorithm used by Kafka), mod the number of partitions in a topic can result in the same number. For example, if you have only one partition, any key obviously goes to the same partition
how if the number of key is more than partition available?
The hash is modulo'd, so it always is assigned a valid partition
Now, if you have a well defined key, you are guaranteed ordering of messages into partitions, so the answer to the number of partitions really comes down to how much throughput a single partition can handle, and there is no short answer - how much data are you sending and how fast can one consumer get that data from one partition at "peak" consumption? Do appropriate performance tests, then scale the partition number up over new topics to handle potential future load
You'll also need to consider "hot" / "cold" data. If you have 10 partitions for example that mapped to the first digit of the ID, then all your data started with even numbers, you'd end up with half of the partitions being empty