Search code examples
apache-kafkaakka-streamalpakka

How to force Alpakka-kafka to read round-robin from topic paritions?


I would like to ask for some input on the following question - I'm using a Consumer.committableSource in my application. During tests I have discovered that instead of going round-robin among partitions of the the Kafka topic, the application will drain a given partition until it consumes the latest entry before switching to the next partition. This is not ideal for my application as it cares about the temporal order at which the events are put on Kafka. This exhaustive way of reading partitions is like going back and forth in time.

Any ideas on how I can tune the consumer to favor round-robin on partition consumption instead? Thank you!


Solution

  • You can use this scenario in 2 ways first one preferable as it achieves parallelization and high throughput with minimal latency.

    1. Create multiple instances for the same consumer. It will work as a consumer group and all instances will shared partition load in parallel. e.g. if you have 4 partitions and you use 2 instances that means ideal case 1 instance will consume 2 partitions. Now if you increase instance to 4 then in that case each instance in the ideal case will be using 1 partition. In that case, partition rebalance will be managed by the consumer's group management.

    2. You can also assign a list of partition to the consumer by using below API

    public void assign(java.util.Collection partitions)

    This will manually be assigned list of partitions to the consumer so consumers will consume only the assigned partition. This will not use consumer rebalance.