Search code examples
javacouchbaseobservablebootstrappingcouchbase-java-api

Couchbase Client Java SDK does not use another cluster node when first doesn't respond


So according to this page: http://developer.couchbase.com/documentation/server/current/sdk/java/start-using-sdk.html

// Connects to a cluster on 10.0.0.1 and tries 10.0.0.2
// if the other one does not respond during bootstrap.
Cluster cluster = CouchbaseCluster.create("10.0.0.1", "10.0.0.2");

Seems straight forward, I open up a Maven project in Eclipse. Using latest Java SDK for couchbase client:

<dependency>
    <groupId>com.couchbase.client</groupId>
    <artifactId>java-client</artifactId>
    <version>2.3.4</version>
</dependency>

My code, short and sweet:

import com.couchbase.client.java.Cluster;
import com.couchbase.client.java.CouchbaseCluster;

public class Main {
    public static void main(String[] args){
        Cluster cluster = CouchbaseCluster.create("10.200.0.10", "10.200.0.11", "10.200.0.12");
        System.out.println(cluster.clusterManager("Administrator", "password").info().raw());
    }
}

My Couchbase cluster consists of three nodes on three VMs. Deployed with docker. They work lovely and I have another application using all the great features in the SDK, but I found something thats breaking when I am doing testing:

If I run the above code while node 10.200.0.11 or/and node 10.200.0.12 is down everything is good. I get a nice json in the console including details on all three nodes!

The Problem: If I run this code and node 10.200.0.10 is down then it does not try to bootstrap using the other two nodes, As specified in their example from their documentation. Instead and eception is thrown and the application ends.

Error:

WARNING: [null][ConfigEndpoint]: Could not connect to remote socket.
Exception in thread "main" java.lang.RuntimeException: java.net.ConnectException: Connection refused: /10.200.0.10:8091
    at     com.couchbase.client.core.utils.Blocking.blockForSingle(Blocking.java:85)
    at com.couchbase.client.java.cluster.DefaultClusterManager.info(DefaultClusterManager.java:59)
    at com.couchbase.client.java.cluster.DefaultClusterManager.info(DefaultClusterManager.java:54)
    at quickTestDeleteThis.Main.main(Main.java:10)
Caused by: java.net.ConnectException: Connection refused: /10.200.0.10:8091
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at com.couchbase.client.deps.io.netty.channel.socket.nio.NioSocketChannel.do    FinishConnect(NioSocketChannel.java:223)
    at com.couchbase.client.deps.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:285)
    at   com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:589)
    at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:513)
    at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:427)
    at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:399)
    at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
    at com.couchbase.client.deps.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
    at java.lang.Thread.run(Thread.java:745)
Nov 02, 2016 4:45:22 PM com.couchbase.client.core.endpoint.AbstractEndpoint$2 operationComplete
WARNING: [null][ConfigEndpoint]: Could not connect to remote socket.

I am assumed it might be something to do with 10.200.0.10 being the orchestrator and no new orchestrator would respond with the cluster information. But according to Couchbase arhcitcture documentation: http://www.couchbase.com/sites/default/files/uploads/all/whitepapers/Couchbase_Server_Architecture_Review.pdf

If the orchestrator node crashes, existing nodes will detect that it is no longer available and will elect a new orchestrator immediately so that the cluster continues to operate without disruption.

It would seem that if the connection to the first node in the list of nodes in the Cluster object is not connectable then its not trying the other, possible bug?


Solution

  • A known issue, currently being fixed. Can be found here: https://issues.couchbase.com/browse/JCBC-999