Search code examples
apache-flinkflink-streaming

FlinkKafkaSource from Multiple Kafka Topics


I am trying to consume from Multiple Kafka Topics using FlinkKafkaSource.

I am trying to build a monitoring dashboard to capture the Metrics like how many messages are sent to these topics etc.

I can create multiple sources (one for each Topic) and join them. How ever FlinkKafkaConsumer allows you to pass a List of Topics so it will be less complex if i create a Single Source and consume from All topics.

Are there any downsides of doing this compared to creating one Source for each topic. (How many concurrent Consumers does Flink create for each Topic/Partition. Is this Configurable ? For ex if i am using SpringBoot i can specify the concurrency on the ConcurrentKafkaListenerContainerFactory)

If Flink uses the same concurrency i.e, whether i use a Single Topic or Multiple Topics then i think using Single Source might limit the amount of messages i can consume.

Thanks Sateesh


Solution

  • The KafkaTopicPartitionAssigner distributes the partitions of each topic uniformly across the subtasks in a round-robin fashion. The subtask to which partition 0 is assigned is determined using the topic name.

    This is intended to evenly distribute the load among the parallel workers without requiring any intervention on your part. But if you do want explicit, fine-grained control, you should stick to instantiating separate consumers.