Search code examples
apache-kafkahigh-availability

Kafka scalability if consuming from replica node


In a cluster scenario with data replication > 1, why is that we must always consume from a master/leader of a partition instead of being able to consume from a replica/follower node that contains a replica of this master node?

I understand the Kafka will always route the request to a master node(of that particular partition/topic) but doesn't this affect scalability (since all requests go to a single node)? Wouldnt it be better if we could read from any node containing the replica information and not necessarily the master?


Solution

  • Partition leader replicas, from which you can write/read data, are evenly distributed among available brokers. Anyway, you may also want to leverage the "fetch from closest replica" functionality, which is described in KIP-392, and available since Kafka 2.4.0.