Search code examples
apache-kafkaapache-zookeeper

Setting up Apache Kafka Cluster


I've been experimenting with Apache Kafka, the distributed streaming platform, but I'm having difficulties with the "distributed" aspect of it.

I'm using the example here, which works fine when everything is on the same machine. But I wanna run it as a cluster with 2 or more VMs

What I managed to do so far:

  • Setting up the VMs properly with Host-Only adapters.
  • Setting up Zookeeper cluster (Quorum mode as pointed out by Rajkumar Natarajan) by adding the following to /etc/zookeeper/conf/zoo.fcg:

    server.1=192.168.56.101:2888:3888
    server.2=192.168.56.102:2888:3888
    

    and making sure myid from /var/lib/zookeeper is unique for each server. Running bin/zkServer.sh status gives one Mode: leader and Mode: follower for the rest as it should.

  • Setting up Kafka cluster by changing the following in config/server.properties:

    broker.id=0 # 1 for the second server
    zookeeper.connect=192.168.56.101:2181,192.168.56.102:2181
    
  • Setting up a sonsumer in Python:

    from kafka import KafkaConsumer
    consumer = KafkaConsumer(
        topic, 
        bootstrap_servers=['192.168.56.101:9092','192.168.56.102:9092'])
    
  • Setting up a producer in Python:

    from kafka import KafkaProducer
    producer = KafkaProducer(bootstrap_servers='192.168.56.101:9092,192.168.56.102:9092')
    

What I want to do:

Configure my Kafka in a way that allows me to run 2 or more brokers on different VMs as a cluster.

My setup:

  • Host: Windows 10 (1803) with VirtualBox 5.2.20
  • Guests: Ubuntu 18.04, Kafka 2.0.0

Solution

  • It took me some time to find the solution since most tutorials stop short of the clustering part or showcase it on one single machine instead of several ones:

    All that needs to be done is adding this line to config/server.properties:

    listeners=PLAINTEXT://192.168.56.101:9092 # for broker.id=0
    listeners=PLAINTEXT://192.168.56.102:9092 # for broker.id=1