I`ve have cluster of 3 brokers. The topic I'm using is configured with a replication factor of 3 and 4 partitions. I'm using the kafka python Api to set up the producers and the consumers.
My problem is that when I stop one of the brokers, in some cases, my consumers stop receiving messages.
(When broker 1 is unavailable):
Topic: frogakas Partition: 0 Leader: 5 Replicas: 5,3,1 Isr: 5,3
Topic: frogakas Partition: 1 Leader: 3 Replicas: 3,1,5 Isr: 3,5
Topic: frogakas Partition: 2 Leader: 5 Replicas: 1,5,3 Isr: 5,3
Topic: frogakas Partition: 3 Leader: 5 Replicas: 5,1,3 Isr: 5,3
The thing is that when I shut-down any of the brokers, some of the consumer groups disappear. All of my consumers share the same group-id so, they stop working because their group is not in the group list.
When all of the brokers are available these are the groups:
root@m3-virtual-machine:/opt/kafka/bin# ./kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
test_group_kas
test_group_1
test_group_2
test_group
When the broker 1 is unavailable only test_group_2 is displayed:
root@m3-virtual-machine:/opt/kafka/bin# ./kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
test_group_2
At this moment, if i start a consumer with the groupId test_group_2 the messages will be received correctly.
I'd like to know if there is any way to avoid the service interruption when a broker falls.
By default, Kafka starts with offsets.topic.replication.factor=1
If you shut down the broker hosting your offsets topic, consumer group management won't work.
The listed group that remains is probably a mistake where you configured zookeeper instead of bootstrap servers.