Search code examples
javatomcatdistributed-cachinghazelcast

Issue with Hazelcast/CONCURRENT_MAP_LOCK after server restart


we are using Hazelcat 1.9.4.4 with cluser of 6 Tomcat servers. We restarted our cluster, ant here is a fragment of the log:

14-Jul-2012 03:25:41 com.hazelcast.nio.InSelector
INFO: /10.152.41.105:5701 [cem-prod] 5701 accepted socket connection from /10.153.26.16:54604
14-Jul-2012 03:25:47 com.hazelcast.cluster.ClusterManager
INFO: /10.152.41.105:5701 [cem-prod]

Members [6] {
        Member [10.152.41.101:5701]
        Member [10.164.101.143:5701]
        Member [10.152.41.103:5701]
        Member [10.152.41.105:5701] this
        Member [10.153.26.15:5701]
        Member [10.153.26.16:5701]
}

We can see that 10.153.26.16 is connected to the cluster, but after it later in the log there is:

14-Jul-2012 03:28:50 com.hazelcast.impl.ConcurrentMapManager
INFO: /10.152.41.105:5701 [cem-prod] ======= 47: CONCURRENT_MAP_LOCK ========
        thisAddress= Address[10.152.41.105:5701], target= Address[10.153.26.16:5701]
        targetMember= Member [10.153.26.16:5701], targetConn=Connection [/10.153.26.16:54604 -> Address[10.153.26.16:5701]] live=true, client=false, type=MEMBER, targetBlock=Block [2] owner=Address[10.153.26.16:5701] migrationAddress=null
        cemClientNotificationsLock Re-doing [20] times! c:__hz_Locks : null
14-Jul-2012 03:28:55 com.hazelcast.impl.ConcurrentMapManager
INFO: /10.152.41.105:5701 [cem-prod] ======= 57: CONCURRENT_MAP_LOCK ========
        thisAddress= Address[10.152.41.105:5701], target= Address[10.153.26.16:5701]
        targetMember= Member [10.153.26.16:5701], targetConn=Connection [/10.153.26.16:54604 -> Address[10.153.26.16:5701]] live=true, client=false, type=MEMBER, targetBlock=Block [2] owner=Address[10.153.26.16:5701] migrationAddress=null
        cemClientNotificationsLock Re-doing [30] times! c:__hz_Locks : null

After several restarts of servers (all together, stop all and start one-by-one etc) we were able to run the system. Could you explain, why Hazelcast fails to lock map at the node if it is in cluster, or if this node was out of cluster, why it is displayed as a member? Also are there any recomendations how to restart Tomcat cluster with distributed Hazelcast structures (stop all nodes and start together, stop and start one-by-one, stop Hazelcast somehow before server restart etc?)? Thanks!


Solution

  • Could you explain, why Hazelcast fails to lock map at the node if it is in cluster

    Map can be locked by some other node at the moment.

    There are also lots of fixes and changes since 1.9.4.4 , it is pretty old version. You should try 2.1+.