Search code examples
apache-kafkadistributed-system

What is the different between the master node in distributed systems and the controller in the Kafka cluster?


I already know, Apache Kafka doesn't follow the master-worker architecture. It has the controller. The controller doing some administrative tasks such as managing the states of partitions, replicas... If the controller responsibilities don't different from the master node which is in the distributed systems, why we aren't called master-worker?


Solution

  • In those types of architectures, you're often submitting tasks to the master, which the task, as a whole, is distributed to the "workers". In the Kafka client protocol, all brokers are considered equal, although the Controller would elect some as leaders for partitions. From the Controller, there's only metadata that's coordinated and state that's maintained to redirect requests to partition leaders, which replicas then read from as followers. The Controller doesn't collect and return back the status of the work it has assigned to those leaders, either (producer acks or consumer commits)

    There's also the Consumer GroupCoordinator, but I don't think you're asking about that