Search code examples
apache-kafkaapache-flinkkafka-consumer-api

Is using Kafka as an input source for Flink a performance bottleneck?


Flink allows to read from a Kafka topic, is that a performance bottleneck making Flink slower overall?


Solution

  • Kafka partitions can scale horizontally to accomodate for higher thoughput.

    One Flink consumer thread can only be assigned to one Kafka partition.

    So, if you have only 1 Kafka partition, and N+1 Flink executors, then you will have N idle tasks, which could be a bottleneck, sure, but that is a tradeoff of having total-ordering within a Kafka topic, not necessarily a Flink problem.

    Otherwise, you would create your Kafka topics with ten to hundreds of partitions, and Flink would be fine to consume it.