I am building a kafka consumer using spring boot, but I have more than one topic to consume data from. But I dont want to create a spring boot app per topic to avoid the maintenance overhead, so if I make this one app consume data from multiple topics and if there are more number of consumers than partitions for these topics (as partitions of each topic many not be the same) would it cause any issue in my app? Like frequent rebalancing?
As I am yet to implement do you have any recommendation on the approach to follow?
There will be no issues, if you have fewer partitions than consumers on some topic(s) then some consumers simply won't be assigned a partition from those topic(s).
However, the default partition distribution may not be what you want.
When listening to multiple topics, the default partition distribution may not be what you expect. For example, if you have three topics with five partitions each and you want to use
concurrency=15
, you see only five active consumers, each assigned one partition from each topic, with the other 10 consumers being idle. This is because the default KafkaPartitionAssignor
is theRangeAssignor
(see its Javadoc). For this scenario, you may want to consider using theRoundRobinAssignor
instead ...