Search code examples
apache-kafkafailover

What's the different between ctrl-c and kill -9 when making kafka broker down


I am going through the kafka official tutorial, and met a weird problem at multibroker part.

I will list what I have done briefly:

  • run local zookeeper on port 2181
  • run three kafka brokers on ports 9092, 9093, 9094
  • created a topic with one partition and three replicas: my-replicated-topic
  • produced several messages into this topic

then I wanna test the fault-tolerance of kafka. instead of kill -9 which works correctly as expected, I used ctrl-c to terminate the leader of the topic. Here is the problem:

I CANNOT consume any messages from kafka.

What's wrong?

p.s. the commands I used are exactly the same as the tutorial mentioned above
kafka 1.00

--updated The following is the some key output:

 bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic   PartitionCount:1    ReplicationFactor:3 Configs:
    Topic: my-replicated-topic  Partition: 0    Leader: 0   Replicas: 0,1,2 Isr: 1,0,2

then I ctrl-c to kill the broker 0:

 bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic   PartitionCount:1    ReplicationFactor:3 Configs:
    Topic: my-replicated-topic  Partition: 0    Leader: 1   Replicas: 0,1,2 Isr: 1,2

at this time I cannot consume from other brokers

bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9093 --from-beginning --topic my-replicated-topic

this is the config for broker 0, configs for others are same as this except for the broker id, port and log directory.

broker.id=0
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0

Solution

  • Your offsets.topic.replication.factor=1 so if you kill the broker with the offsets for your consumer, there is no other replica and so you have no offsets.

    It is not a fault tolerant config to have replication factor = 1 on your __consumer_offsets topic

    Not only will you need to change this config but you will then also have to modify the existing consumer offset topic to be replication factor = 3