Search code examples
rubydockerapache-kafkaruby-kafka

Ruby Kafka Uncaught exception: Failed to find group coordinator


I use Apache Kafka as Docker container https://hub.docker.com/r/wurstmeister/kafka/

I'm able to successfully connect to Kafka from my Java application with Spring Kafka.

But when I try to connect to Kafka from the Ruby application via Ruby Kafka I receive the following error:

Uncaught exception: Failed to find group coordinator

The only difference between the Java and Ruby applications - is that Ruby application is located on another machine in my local network but I can see the Kafka machine from Ruby machine and all of the ports there.

How to find the issue and solve it?

UPDATED

I, [2018-06-25T10:06:49.513848 #62261]  INFO -- : New topics added to target list: post.sent
I, [2018-06-25T10:06:49.514036 #62261]  INFO -- : Fetching cluster metadata from kafka://10.0.0.102:9093
D, [2018-06-25T10:06:49.514262 #62261] DEBUG -- : Opening connection to 10.0.0.102:9093 with client id test...
D, [2018-06-25T10:06:49.518350 #62261] DEBUG -- : Sending topic_metadata API request 1 to 10.0.0.102:9093
D, [2018-06-25T10:06:49.519336 #62261] DEBUG -- : Waiting for response 1 from 10.0.0.102:9093
D, [2018-06-25T10:06:49.530220 #62261] DEBUG -- : Received response 1 from 10.0.0.102:9093
I, [2018-06-25T10:06:49.530351 #62261]  INFO -- : Discovered cluster metadata; nodes: 10.0.75.1:9093 (node_id=1001)
D, [2018-06-25T10:06:49.530439 #62261] DEBUG -- : Closing socket to 10.0.0.102:9093
I, [2018-06-25T10:06:49.530682 #62261]  INFO -- : Joining group `my_group`
D, [2018-06-25T10:06:49.530812 #62261] DEBUG -- : Getting group coordinator for `my_group`
D, [2018-06-25T10:06:49.531019 #62261] DEBUG -- : Opening connection to 10.0.75.1:9093 with client id test...
D, [2018-06-25T10:06:49.616368 #62261] DEBUG -- : Handling fetcher command: subscribe
I, [2018-06-25T10:06:49.616797 #62261]  INFO -- : Will fetch at most 1048576 bytes at a time per partition from post.sent
D, [2018-06-25T10:06:49.617262 #62261] DEBUG -- : Handling fetcher command: configure
D, [2018-06-25T10:06:49.617462 #62261] DEBUG -- : Handling fetcher command: start
D, [2018-06-25T10:06:49.617599 #62261] DEBUG -- : Fetching batches
I, [2018-06-25T10:06:49.618108 #62261]  INFO -- : Fetching cluster metadata from kafka://10.0.0.102:9093
D, [2018-06-25T10:06:49.619053 #62261] DEBUG -- : Opening connection to 10.0.0.102:9093 with client id test...
D, [2018-06-25T10:06:49.624053 #62261] DEBUG -- : Sending topic_metadata API request 1 to 10.0.0.102:9093
D, [2018-06-25T10:06:49.625459 #62261] DEBUG -- : Waiting for response 1 from 10.0.0.102:9093
D, [2018-06-25T10:06:49.635283 #62261] DEBUG -- : Received response 1 from 10.0.0.102:9093
I, [2018-06-25T10:06:49.635468 #62261]  INFO -- : Discovered cluster metadata; nodes: 10.0.75.1:9093 (node_id=1001)
D, [2018-06-25T10:06:49.635596 #62261] DEBUG -- : Closing socket to 10.0.0.102:9093
I, [2018-06-25T10:06:49.635853 #62261]  INFO -- : There are no partitions to fetch from, sleeping for 1s
D, [2018-06-25T10:06:50.637187 #62261] DEBUG -- : Fetching batches
I, [2018-06-25T10:06:50.637804 #62261]  INFO -- : There are no partitions to fetch from, sleeping for 1s
D, [2018-06-25T10:06:51.642172 #62261] DEBUG -- : Fetching batches
I, [2018-06-25T10:06:51.642471 #62261]  INFO -- : There are no partitions to fetch from, sleeping for 1s
D, [2018-06-25T10:06:52.645354 #62261] DEBUG -- : Fetching batches
I, [2018-06-25T10:06:52.645640 #62261]  INFO -- : There are no partitions to fetch from, sleeping for 1s
D, [2018-06-25T10:06:53.647833 #62261] DEBUG -- : Fetching batches
I, [2018-06-25T10:06:53.648259 #62261]  INFO -- : There are no partitions to fetch from, sleeping for 1s
D, [2018-06-25T10:06:54.650357 #62261] DEBUG -- : Fetching batches
I, [2018-06-25T10:06:54.650647 #62261]  INFO -- : There are no partitions to fetch from, sleeping for 1s
D, [2018-06-25T10:06:55.652582 #62261] DEBUG -- : Fetching batches
I, [2018-06-25T10:06:55.653477 #62261]  INFO -- : There are no partitions to fetch from, sleeping for 1s
D, [2018-06-25T10:06:56.657937 #62261] DEBUG -- : Fetching batches
I, [2018-06-25T10:06:56.659627 #62261]  INFO -- : There are no partitions to fetch from, sleeping for 1s
D, [2018-06-25T10:06:57.664130 #62261] DEBUG -- : Fetching batches
I, [2018-06-25T10:06:57.664861 #62261]  INFO -- : There are no partitions to fetch from, sleeping for 1s
D, [2018-06-25T10:06:58.666290 #62261] DEBUG -- : Fetching batches
I, [2018-06-25T10:06:58.666620 #62261]  INFO -- : There are no partitions to fetch from, sleeping for 1s
E, [2018-06-25T10:06:59.534809 #62261] ERROR -- : Timed out while trying to connect to 10.0.75.1:9093: Operation timed out
D, [2018-06-25T10:06:59.535083 #62261] DEBUG -- : Closing socket to 10.0.75.1:9093
E, [2018-06-25T10:06:59.535342 #62261] ERROR -- : Failed to get group coordinator info from 10.0.75.1:9093 (node_id=1001): Operation timed out
I, [2018-06-25T10:06:59.535567 #62261]  INFO -- : Leaving group `my_group`
D, [2018-06-25T10:06:59.535709 #62261] DEBUG -- : Getting group coordinator for `my_group`
D, [2018-06-25T10:06:59.535875 #62261] DEBUG -- : Opening connection to 10.0.75.1:9093 with client id test...
D, [2018-06-25T10:06:59.666983 #62261] DEBUG -- : Handling fetcher command: stop
E, [2018-06-25T10:07:09.540409 #62261] ERROR -- : Timed out while trying to connect to 10.0.75.1:9093: Operation timed out
D, [2018-06-25T10:07:09.540833 #62261] DEBUG -- : Closing socket to 10.0.75.1:9093
E, [2018-06-25T10:07:09.541172 #62261] ERROR -- : Failed to get group coordinator info from 10.0.75.1:9093 (node_id=1001): Operation timed out
Exiting
Uncaught exception: Failed to find group coordinator

Solution

  • Your broker is returning a Group Coordinator Not Available error. In normal conditions this should be a temporary condition while your kafka cluster configures a coordinator node. In your case, unfortunately, something in your cluster is not operating correctly and a coordinator is not being assigned.

    You should check your cluster configuration, starting with your logs.

    You may find useful the solution posted here, which I quote:

    When using bootstrap-server parameter, the connection is through the Brokers instead of Zookeeper. The Brokers use __consumer_offsets to store information about committed offsets for each topic:partition per group of consumers (groupID). In this case, __consumer_offsets was pointing to invalid Broker IDs. Hence, the above the exception was displayed. To check if the Broker IDs are correct for this topic, execute the following command:

    kafka-topics.sh --describe --zookeeper <zkHost:zkPort> --topic __consumer_offsets
    

    Then, compare with the Brokers registered in the Zookeeper using the following command:

    zkCli.sh -server <zkHost:zkPort>
    

    After connecting to Zookeeper, check the Brokers IDs using the following command:

    [zk: server1.openstacklocal:2181(CONNECTED) 0] ls /brokers/ids
    

    If Broker IDs do not match, then proceed with the solution of this article.

    Solution:

    To resolve this issue, do the following:

    Connect to Zookeeper using the following command:
    
     zkCli.sh -server <zkHost:zkPort>
    

    Remove __consumer_offset using the following command:

     znode rmr /brokers/topics/__consumer_offset
    

    Restart the Brokers.

    Side-note: sometimes this problem happens when the zookeeper is still being started.