I am building a distributed application and I decided to introduce Kafka to it. I am however having a tough time figuring out something.
I understand that having a consumer group ensures HA and high message throughput. Each consumer in the group though is "following" only one partition in the topic. Let's say for example that we have 1 topic with 4 partition and one consumer group with 4 consumers. Each consumer, as mentioned earlier, will get messages only from its designated partition.
Now let's say that we have a number of producers publish messages to the topic. One producer writes a message to partition 1 of the topic and consumer 1 receives it and performs some logic with it so it is busy. Then another message is published but to the same partition. None of the other consumers will be able to receive it as they the partition does not "belong" to them.
I am looking for a way that whenever a new message is produced at least one idle consumer will receive it instantly even if it is not written to its partition.
As far as I know this will not be possible with Kafka. As you correctly described, there can only be at most one consumer out of a group that reads one partition. That way you ensure ordering of the messages within a partition.
What you could do to prevent the application to get even slower when one of the consumer dies is to have more consumers than partitions. That way you will have some idle consumers that can keep up with the state of the actual consuming consumers and jump-in quickly if one of the working consumers dies.
Also you could increase the partitions to avoid having too much load on one of them. Or, as an alternative, and if you know your data in advance, you could have a custom partitioner in your Kafka Producer that distributes the messages in a way that processing can evenly ditributed over the partitions and their consumers.