Search code examples
javaspring-boottcphazelcastdistributed-cache

First hazelcast node is shutting down instead of becoming master


I am trying to form a cluster using tcp/ip discovery. I am unable to understand as to why the first node is not being chosen as master. There are no other nodes in the cluster. And the error logs are not self explanatory.

Debug logs :

2020-10-27 05:31:46 DEBUG com.hazelcast.internal.cluster.ClusterService:49 - [192.168.10.31]:5701 [dev] [3.12] Setting master address to null
2020-10-27 05:31:46 DEBUG com.hazelcast.cluster.impl.TcpIpJoiner:49 - [192.168.10.31]:5701 [dev] [3.12] PostJoin master: null, isMaster: false
2020-10-27 05:31:46 ERROR com.hazelcast.instance.Node:49 - [192.168.10.31]:5701 [dev] [3.12] Could not join cluster. Shutting down now!
2020-10-27 05:31:46 INFO  com.hazelcast.core.LifecycleService:49 - [192.168.10.31]:5701 [dev] [3.12] [192.168.10.31]:5701 is SHUTTING_DOWN
2020-10-27 05:31:46 WARN  com.hazelcast.instance.Node:49 - [192.168.10.31]:5701 [dev] [3.12] Terminating forcefully...
2020-10-27 05:31:46 DEBUG com.hazelcast.internal.cluster.ClusterService:49 - [192.168.10.31]:5701 [dev] [3.12] Setting master address to null
2020-10-27 05:31:46 INFO  com.hazelcast.instance.Node:49 - [192.168.10.31]:5701 [dev] [3.12] Shutting down connection manager...

Hazelcast version : 3.12

<dependency>
  <groupId>com.hazelcast</groupId>
  <artifactId>hazelcast</artifactId>
  <version>3.12</version>
</dependency>

Hazelcast config :

String hazelcastClusterMemberOne = 192.168.10.*
Config config = new Config();
        NetworkConfig network = config.getNetworkConfig();
        JoinConfig join = network.getJoin();
        join.getMulticastConfig().setEnabled(false);
        join.getTcpIpConfig().addMember(hazelcastClusterMemberOne)
                .setEnabled(true);

        HazelcastInstance hazelcast = Hazelcast.newHazelcastInstance(config);

Error Logs :

2020-10-27 05:31:46 [main] ERROR com.hazelcast.instance.Node com.hazelcast.instance.Node:49 - [192.168.10.31]:5701 [dev] [3.12] Could not join cluster. Shutting down now!
2020-10-27 05:31:46 [main] INFO  com.hazelcast.core.LifecycleService com.hazelcast.core.LifecycleService:49 - [192.168.10.31]:5701 [dev] [3.12] [192.168.10.31]:5701 is SHUTTING_DOWN
2020-10-27 05:31:46 [main] WARN  com.hazelcast.instance.Node com.hazelcast.instance.Node:49 - [192.168.10.31]:5701 [dev] [3.12] Terminating forcefully...
2020-10-27 05:31:46 [main] INFO  com.hazelcast.instance.Node com.hazelcast.instance.Node:49 - [192.168.10.31]:5701 [dev] [3.12] Shutting down connection manager...

EDIT : This is happening on the server which is hosted on AWS cloud, but the above config works fine on my local machine


Solution

  • Try changing from a wildcard to an explicit IP address.

    Ie. Not

    getTcpIpConfig().addMember("192.168.10.*")
                    .setEnabled(true);
    

    but

    getTcpIpConfig().addMember("192.168.10.1")
                    .setEnabled(true);
    

    Or if you need several possibilities, list them explicitly

    getTcpIpConfig().addMember("192.168.10.1")
                    .addMember("192.168.10.2")
                    .addMember("192.168.10.3")
                    .setEnabled(true);
    

    UPDATED BELOW

    TcpIpConfig wasn't intended for use with a large range of possibilities. Wwildcard is not implemented for this field. You could list all 256 possibilities, or submit a PR implementing wildcard. Either way it's 256 port to probe, which will be on the slow side.

    If you know the address of the first node at runtime, you could pass this into the others as a property.

    If you don't, then one of the other discovery mechanisms is probably going to be a better choice.

    Note also, TcpIpConfig is just the specification for discovery mechanism, not the communication mechanism once discovered. Performance of member to member communication is unrelated to the choice of discovery mechanism.

    UPDATED 2 BELOW The above answer is incorrect, having now tried it with 3.12.0, wildcard is implemented.