Search code examples
apache-kafkaload-balancingkafka-consumer-api

How to load balance kafka?


Can anyone help me with load balancing in kafka? what logic is to be implemented? I think that deploying multi broker multi node kafka will resolve the issue? And also if someone can guide me that increase in partitions may effect load balancing and throughput of kafka?


Solution

  • If you mean scaling a Kafka cluster, the bare minimum you need to do is:

    • Add more brokers to the cluster
    • Rebalance the topics and partitions

    It is described in here: https://kafka.apache.org/documentation/#basic_ops_cluster_expansion

    Adding servers to a Kafka cluster is easy, just assign them a unique broker id and start up Kafka on your new servers. However these new servers will not automatically be assigned any data partitions, so unless partitions are moved to them they won't be doing any work until new topics are created. So usually when you add machines to your cluster you will want to migrate some existing data to these machines.

    Consumers and producers will automatically rebalance to use the new nodes once their partitions have been moved to the new nodes.

    To understand how consumers and producers scale with the number of partitions I recommend reading Kafka key concepts: https://kafka.apache.org/documentation/#intro_concepts_and_terms

    Topics are partitioned, meaning a topic is spread over a number of "buckets" located on different Kafka brokers. This distributed placement of your data is very important for scalability because it allows client applications to both read and write the data from/to many brokers at the same time. When a new event is published to a topic, it is actually appended to one of the topic's partitions. Events with the same event key (e.g., a customer or vehicle ID) are written to the same partition, and Kafka guarantees that any consumer of a given topic-partition will always read that partition's events in exactly the same order as they were written.