I want to use key/value pattern writing to Kafka in order to keep the same order of data writing while reading it. My question is should the number of partitions in the topic be equal to the number of different keys in the incoming data. I already know that with the key/value pattern data having the same key will go to the same partition.
Hence if number of partitions is not equal to the number of different keys in data, we can have data having different keys in the same partition? In this case how data order is kept?
My question is should the number of partitions in the topic be equal to the number of different keys in the incoming data.
I don't think that this is generally a good idea. It totally depends on the data you are processing. In case you have a fixed amount of keys (such as female, male and diverse) it might make sense. However, even then you need to be careful as the this could lead to an unbalance of data load over the broker as there might be less diverse. So you could end up having most of the data in one partition whereas the other partition(s) would be left empty. In general, the amount of partitions should be adjusted to your throughput requirements.
Hence if number of partitions is not equal to the number of different keys in data, we can have data having different keys in the same partition? In this case how data order is kept?
Yes, you could end up having different key in the same partition. Then the ordering is kept for this particular partition but not guaranteed in the topic overall.So assume, you have the keys A, B, and C and a topic with two partitions. A and C goes to the first partition and B is stored in the second partition. If data is flowing like this: A/V1, A/V2, B/V1, C/V1, B/V2
Then your partitions will be filled like this:
When consuming this topic it is not clear how the ordering between A and C messages are in relation to B messages. However, it is always guaranteed that the message A/V1 is consumed before A/V2, A/V2 before C/V1, and B/V1 before B/V2.
If you are looking for a more flexible way of directing your messages into partitions you can also think of writing a custom partitioner.