Search code examples
apache-kafkakafka-consumer-api

Why single node multiple broker in kafka cluster not preferred?


I am trying to implement kafka into production. Wanted to know why single-node, multiple-broker kafka instance is not preferred. Few people suggested that if multiple brokers are used on single node, they should be allocated separate disk space but the reason to do so is not clear.

Can someone please explain the impact of single broker vs multiple broker kafka instance on a single node.


Solution

  • Every topic, is a particular stream of data (similar to a table in a database). Topics, are split into partitions (as many as you like) where each message within a partition gets an incremental id, known as offset as shown below.

    Partition 0:

    +---+---+---+-----+
    | 0 | 1 | 2 | ... |
    +---+---+---+-----+
    

    Partition 1:

    +---+---+---+---+----+
    | 0 | 1 | 2 | 3 | .. |
    +---+---+---+---+----+
    

    Now a Kafka cluster is composed of multiple brokers. Each broker is identified with an ID and can contain certain topic partitions.

    Example of 2 topics (each having 3 and 2 partitions respectively):

    Broker 1:

    +-------------------+
    |      Topic 1      |
    |    Partition 0    |
    |                   |
    |                   |
    |     Topic 2       |
    |   Partition 1     |
    +-------------------+
    

    Broker 2:

    +-------------------+
    |      Topic 1      |
    |    Partition 2    |
    |                   |
    |                   |
    |     Topic 2       |
    |   Partition 0     |
    +-------------------+
    

    Broker 3:

    +-------------------+
    |      Topic 1      |
    |    Partition 1    |
    |                   |
    |                   |
    |                   |
    |                   |
    +-------------------+
    

    Note that data is distributed (and Broker 3 doesn't hold any data of topic 2).

    Topics, should have a replication-factor > 1 (usually 2 or 3) so that when a broker is down, another one can serve the data of a topic. For instance, assume that we have a topic with 2 partitions with a replication-factor set to 2 as shown below:

    Broker 1:

    +-------------------+
    |      Topic 1      |
    |    Partition 0    |
    |                   |
    |                   |
    |                   |
    +-------------------+
    

    Broker 2:

    +-------------------+
    |      Topic 1      |
    |    Partition 0    |
    |                   |
    |                   |
    |     Topic 1       |
    |   Partition 1     |
    +-------------------+
    

    Broker 3:

    +-------------------+
    |      Topic 1      |
    |    Partition 1    |
    |                   |
    |                   |
    |                   |
    +-------------------+
    

    Now assume that Broker 2 has failed. Broker 1 and 3 can still serve the data for topic 1. So a replication-factor of 3 is always a good idea since it allows for one broker to be taken down for maintenance purposes and also for another one to be taken down unexpectedly. Therefore, Apache-Kafka offers strong durability and fault tolerance guarantees.