Ignite start very slow when with large Xmx upto 64G, and could not form cluster with timeout

I have 3 ignite server, each with 128G memroy, when i start the cluster with JVM opt wiht Xmx and Xms 64G, it boot up very very slow, will take several miniutes

my jvm opt:

JVM_OPTS="-server       \
-Xms64G     \
-Xmx64G     \
-XX:+AlwaysPreTouch     \
-XX:+UseG1GC        \
-XX:+ScavengeBeforeFullGC       \
-XX:+DisableExplicitGC      \
-XX:MaxGCPauseMillis=200        \
-XX:InitiatingHeapOccupancyPercent=45       \
-XX:+PrintGCDateStamps      \
-XX:+PrintGCDetails     \
-Xloggc:/var/log/apache-ignite/apache-ignite-gc.log     \
-XX:+UseGCLogFileRotation       \
-XX:GCLogFileSize=10M       \
-XX:NumberOfGCLogFiles=20       \
-Djava.awt.headless=true"

when the first server start, i trigger the second one, it will take long time to form the cluster(seem load /read some data,very slow), but finally it seems ok, but when i start the third one it will always made the cluster fail, log show some node exit(timeout?), i could not start a third node

[11:48:33,743][INFO][tcp-disco-sock-reader-#5][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.162:38607, rmtPort=38607
[11:48:39,468][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.162, rmtPort=53016]
[11:48:39,469][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.162, rmtPort=53016]
[11:48:39,469][INFO][tcp-disco-sock-reader-#7][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.162:53016, rmtPort=53016]
[11:48:43,770][WARNING][tcp-disco-msg-worker-#3][TcpDiscoverySpi] Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout [currentTimeout=10000, rmtAddr=/192.168.28.161:47500, rmtPort=47500]
[11:48:43,775][WARNING][tcp-disco-msg-worker-#3][TcpDiscoverySpi] Failed to send message to next node [msg=TcpDiscoveryNodeAddedMessage [node=TcpDiscoveryNode [id=3f7dad77-dc5d-45a2-ad02-89e590304c03, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.161], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.161:47500], discPort=47500, order=0, intOrder=3, lastExchangeTime=1525448898690, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataPacket@24ac7e3f, discardMsgId=null, discardCustomMsgId=null, top=null, clientTop=null, gridStartTime=1525448001350, super=TcpDiscoveryAbstractMessage [sndNodeId=84b416f0-f146-4907-9ff6-9391568eacea, id=43b49cb2361-84b416f0-f146-4907-9ff6-9391568eacea, verifierNodeId=84b416f0-f146-4907-9ff6-9391568eacea, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=TcpDiscoveryNode [id=3f7dad77-dc5d-45a2-ad02-89e590304c03, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.161], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.161:47500], discPort=47500, order=0, intOrder=3, lastExchangeTime=1525448898690, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], errMsg=Failed to send message to next node [msg=TcpDiscoveryNodeAddedMessage [node=TcpDiscoveryNode [id=3f7dad77-dc5d-45a2-ad02-89e590304c03, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.161], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.161:47500], discPort=47500, order=0, intOrder=3, lastExchangeTime=1525448898690, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataPacket@24ac7e3f, discardMsgId=null, discardCustomMsgId=null, top=null, clientTop=null, gridStartTime=1525448001350, super=TcpDiscoveryAbstractMessage [sndNodeId=84b416f0-f146-4907-9ff6-9391568eacea, id=43b49cb2361-84b416f0-f146-4907-9ff6-9391568eacea, verifierNodeId=84b416f0-f146-4907-9ff6-9391568eacea, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=ClusterNode [id=3f7dad77-dc5d-45a2-ad02-89e590304c03, order=0, addr=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.161], daemon=false]]]
[11:48:47,355][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.161, rmtPort=52626]
[11:48:47,355][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.161, rmtPort=52626]
[11:48:47,355][INFO][tcp-disco-sock-reader-#8][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.161:52626, rmtPort=52626]
[11:48:47,376][INFO][tcp-disco-sock-reader-#8][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.161:52626, rmtPort=52626
[11:48:49,485][INFO][tcp-disco-sock-reader-#7][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.162:53016, rmtPort=53016
[11:48:49,551][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.162, rmtPort=51850]
[11:48:49,551][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.162, rmtPort=51850]
[11:48:49,551][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.162:51850, rmtPort=51850]
[11:48:49,553][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Received ping request from the remote node [rmtNodeId=84b416f0-f146-4907-9ff6-9391568eacea, rmtAddr=/192.168.28.162:51850, rmtPort=51850]
[11:48:49,554][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Finished writing ping response [rmtNodeId=84b416f0-f146-4907-9ff6-9391568eacea, rmtAddr=/192.168.28.162:51850, rmtPort=51850]
[11:48:49,554][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.162:51850, rmtPort=51850
[11:48:51,166][WARNING][tcp-disco-msg-worker-#3][TcpDiscoverySpi] Local node has detected failed nodes and started cluster-wide procedure. To speed up failure detection please see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi'
[11:48:51,169][WARNING][tcp-disco-msg-worker-#3][TcpDiscoverySpi] Node is out of topology (probably, due to short-time network problems).
[11:48:51,169][WARNING][disco-event-worker-#101][GridDiscoveryManager] Local node SEGMENTED: TcpDiscoveryNode [id=ba873112-9978-4845-9d92-b25816edbb34, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.163], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, Redis3/192.168.28.163:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1525448931164, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
[11:48:51,176][WARNING][disco-event-worker-#101][GridDiscoveryManager] Stopping local node according to configured segmentation policy.
[11:48:51,177][WARNING][disco-event-worker-#101][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=84b416f0-f146-4907-9ff6-9391568eacea, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.162], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.162:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1525448128889, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
[11:48:51,178][INFO][disco-event-worker-#101][GridDiscoveryManager] Topology snapshot [ver=3, servers=1, clients=0, CPUs=32, offheap=25.0GB, heap=64.0GB]
[11:48:51,178][INFO][disco-event-worker-#101][GridDiscoveryManager] Data Regions Configured:
[11:48:51,179][INFO][disco-event-worker-#101][GridDiscoveryManager]   ^-- default [initSize=256.0 MiB, maxSize=25.1 GiB, persistenceEnabled=true]
[11:48:51,184][INFO][Thread-34][GridTcpRestProtocol] Command protocol successfully stopped: TCP binary
[11:48:51,192][INFO][Thread-34][GridJettyRestProtocol] Command protocol successfully stopped: Jetty REST
[11:48:51,196][INFO][exchange-worker-#102][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=0], crd=true, evt=NODE_FAILED, evtNode=84b416f0-f146-4907-9ff6-9391568eacea, customEvt=null, allowMerge=true]
[11:48:51,197][INFO][exchange-worker-#102][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=3, minorTopVer=0], resVer=null, err=class org.apache.ignite.internal.IgniteInterruptedCheckedException: null]
[11:48:51,201][INFO][db-checkpoint-thread-#114][GridCacheDatabaseSharedManager] Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, checkpointLockHoldTime=0ms, reason='timeout']
[11:48:51,212][INFO][Thread-34][GridCacheProcessor] Stopped cache [cacheName=ignite-sys-cache]
[11:48:51,244][INFO][Thread-34][IgniteKernal]

I have searched on net, and see somebody said to use a large value of:

<property name="failureDetectionTimeout" value="200000"/>

,with this the third node could up, but with Fatal error:

[15:00:17,086][SEVERE][exchange-worker-#102][GridCachePartitionExchangeManager] Failed to wait for completion of partition map exchange (preloading will not start): GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryCustomEvent [customMsg=null, affTopVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], super=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=d520c781-a2ec-44a3-8f39-b2d72928a811, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.163], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.163:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1525460353176, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=3, nodeId8=7a048246, msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1525460411312]], crd=TcpDiscoveryNode [id=7a048246-9b6e-4f03-b6bc-f44e69feb0db, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.161], sockAddrs=[Redis1/192.168.28.161:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1525460417080, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], discoEvt=DiscoveryCustomEvent [customMsg=null, affTopVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], super=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=d520c781-a2ec-44a3-8f39-b2d72928a811, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.163], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.163:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1525460353176, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=3, nodeId8=7a048246, msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1525460411312]], nodeId=d520c781, evt=DISCOVERY_CUSTOM_EVT], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true, hash=2145189203], init=true, lastVer=GridCacheVersion [topVer=0, order=1525460352448, nodeOrder=0], partReleaseFut=PartitionReleaseFuture [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], futures=[]], TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], futures=[]], AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], futures=[]], DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], futures=[]]]], exchActions=null, affChangeMsg=null, initTs=1525460411332, centralizedAff=false, changeGlobalStateE=null, done=true, state=DONE, evtLatch=0, remaining=[], super=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=class o.a.i.IgniteCheckedException: Cluster state change failed., hash=204829629]]
class org.apache.ignite.IgniteCheckedException: Cluster state change failed.
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.finishExchangeOnCoordinator(GridDhtPartitionsExchangeFuture.java:2539)
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onAllReceived(GridDhtPartitionsExchangeFuture.java:2334)
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2071)
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:124)
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1928)
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1916)
    at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
    at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:1916)
    at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1531)
    at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCachePartitionExchangeManager.java:133)
    at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:332)
    at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:312)
    at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2689)
    at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2668)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
    at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555)
    at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183)
    at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
    at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

when i change to XmX to 4G, it will be a littler faster(still very slow), but the cluster will formed OK with 3 nodes.

How could i use the large 128G memory, as my work need very large memroy to handle, also how to make the cluster /start of the 3 node cluster more quickly, and i don't think using failureDetectionTimeout upto 200S is a good way.

Solution

How many RAM do you have on target machine?

Ignite 2.x stores data in Off-Heap storage (i.e. not under -Xmx). By default, 20% of your RAM is used for Off-Heap storage policy, you can change that in configuration. Then, if (Xmx + Memory Policy) ≥ Total RAM, your system will eventually grind to halt and your cluster will segment.

Do you really need such large -Xmx for Ignite, considering that data is stored in Off-Heap? I would recommend -Xmx4G and 60G memory policy instead.

Your cluster should start immediately. If that doesn't happen, try to replace Multicast discovery with Vm discovery.