Search code examples
apache-kafkabroker

Kafka broker setup


To connect to a Kafka cluster I've been provided with a set of bootstrap servers with name and port :

s1:90912
s2:9092
s3:9092

Kafka and Zookeeper are running on the instance s4. From reading https://jaceklaskowski.gitbooks.io/apache-kafka/content/kafka-properties-bootstrap-servers.html, it states:

bootstrap server is a comma-separated list of host and port pairs that are the addresses of the Kafka brokers in a "bootstrap" Kafka cluster that a Kafka client connects to initially to bootstrap itself.

I reference the above bootstrap server definition as I'm trying to understand the relationship between the kafka brokers s1,s2,s3 and kafka,zookeeper running on s4.

To connect to the Kafka cluster, I set the broker to a CSV list of 's1,s1,s3'. When I send messages to the CSV list of brokers, to verify the messages are added to the topic, I ssh onto the s4 box and view the messages on the topic.

What is the link between the Kafka brokers s1,s2,s3 and s4? I cannot ssh onto any of the brokers s1,s2,s3 as these brokers do not seem accessible using ssh, should s1,s2,s3 be accessible?

The individual responsible for the setup of the Kafka box is no longer available, and I'm confused as to how this configuration works. I've searched for config references of the brokers s1,s2,s3 on s4 but there does not appear to be any configuration.

When Kafka is being set up and configured what allows the linking between the brokers (in this case s1,s2,s3) and s4?

I start Kafka and Zookeeper on the same server, s4.

Should Kafka and Zookeeper also be running on s1,s2,s3?


Solution

  • What is the link between the Kafka brokers s1,s2,s3 and s4?

    As per the Kafka documentation about adding nodes to a cluster, each server must share the same zookeeper.connect string and have a unique broker.id to be part of the cluster.

    You may check which nodes are in the cluster via zookeeper-shell with an ls /brokers/ids, or via the Kafka AdminClient API, or kafkacat -L

    should s1,s2,s3 be accessible?

    Via SSH? They don't have to be.

    They should respond to TCP connections from your Kafka client machines on their Kafka server ports, though

    Should Kafka and Zookeeper also be running on s1,s2,s3?

    You should not have 4 Zookeeper servers in a cluster (odd numbers, only)

    Otherwise, you've at least been given some ports for Kafka on those machines, therefore Kafka should be