Search code examples
ignite

apache ignite node not able to join in the cluster


I'm using apacheignite:2.5.0 docker image deployed in 2 different ec2-instances and using static IP finder config below is the config file, one of the node is unable to join in the cluster. I have attached logs also please find below its accepting connection and disconnecting , i ran docker container with --net=host so conatainer attach all ports to host machine and all ports are opened in security group

#

**>

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:util="http://www.springframework.org/schema/util"
       xsi:schemaLocation=" 
        http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans.xsd
        http://www.springframework.org/schema/util
        http://www.springframework.org/schema/util/spring-util.xsd">
    <bean abstract="false" id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">


        <property name="discoverySpi">
            <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
                <property name="ipFinder">
                    <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                        <property name="addresses">
                                                 <list>



              <value>34.241.10.9:47500</value>
                 </list>
                                 </property>                                  
                 </bean>
                </property>
            </bean>
        </property>
    </bean>
</beans>**
[12:59:25,309][INFO][disco-event-worker-#37][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=07b55edb-cdb7-45eb-bfd6-36fe9c5f5f15, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.18.0.1, 172.31.29.3], sockAddrs=[/172.31.29.3:47500, /172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /172.18.0.1:47500], discPort=47500, order=312, intOrder=157, lastExchangeTime=1529067545288, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
[12:59:25,309][INFO][disco-event-worker-#37][GridDiscoveryManager] Topology snapshot [ver=312, servers=2, clients=0, CPUs=6, offheap=3.8GB, heap=2.0GB]
[12:59:25,309][INFO][disco-event-worker-#37][GridDiscoveryManager] Data Regions Configured:
[12:59:25,309][INFO][disco-event-worker-#37][GridDiscoveryManager]   ^-- default [initSize=256.0 MiB, maxSize=710.0 MiB, persistenceEnabled=false]
[12:59:25,309][INFO][exchange-worker-#38][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=312, minorTopVer=0], crd=true, evt=NODE_JOINED, evtNode=07b55edb-cdb7-45eb-bfd6-36fe9c5f5f15, customEvt=null, allowMerge=true]
[12:59:25,309][WARNING][disco-event-worker-#37][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=07b55edb-cdb7-45eb-bfd6-36fe9c5f5f15, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.18.0.1, 172.31.29.3], sockAddrs=[/172.31.29.3:47500, /172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /172.18.0.1:47500], discPort=47500, order=312, intOrder=157, lastExchangeTime=1529067545288, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
[12:59:25,310][INFO][exchange-worker-#38][GridDhtPartitionsExchangeFuture] Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=312, minorTopVer=0], waitTime=0ms, futInfo=NA]
[12:59:25,310][INFO][exchange-worker-#38][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=312, minorTopVer=0], crd=true]
[12:59:25,310][INFO][disco-event-worker-#37][GridDiscoveryManager] Topology snapshot [ver=313, servers=1, clients=0, CPUs=2, offheap=0.69GB, heap=1.0GB]
[12:59:25,310][INFO][disco-event-worker-#37][GridDiscoveryManager] Data Regions Configured:
[12:59:25,310][INFO][disco-event-worker-#37][GridDiscoveryManager]   ^-- default [initSize=256.0 MiB, maxSize=710.0 MiB, persistenceEnabled=false]
[12:59:25,310][INFO][disco-event-worker-#37][GridDhtPartitionsExchangeFuture] Coordinator received all messages, try merge [ver=AffinityTopologyVersion [topVer=312, minorTopVer=0]]
[12:59:25,311][INFO][disco-event-worker-#37][GridCachePartitionExchangeManager] Merge exchange future [curFut=AffinityTopologyVersion [topVer=312, minorTopVer=0], mergedFut=AffinityTopologyVersion [topVer=313, minorTopVer=0], evt=NODE_FAILED, evtNode=07b55edb-cdb7-45eb-bfd6-36fe9c5f5f15, evtNodeClient=false]
[12:59:25,311][INFO][disco-event-worker-#37][GridDhtPartitionsExchangeFuture] finishExchangeOnCoordinator [topVer=AffinityTopologyVersion [topVer=312, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=313, minorTopVer=0]]
[12:59:25,311][INFO][disco-event-worker-#37][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=312, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=313, minorTopVer=0], err=null]
[12:59:25,312][INFO][exchange-worker-#38][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=313, minorTopVer=0], evt=NODE_JOINED, node=07b55edb-cdb7-45eb-bfd6-36fe9c5f5f15]
[12:59:25,315][INFO][grid-timeout-worker-#23][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=225f750c, uptime=01:42:00.504]
    ^-- H/N/C [hosts=1, nodes=1, CPUs=2]
    ^-- CPU [cur=0.17%, avg=0.4%, GC=0%]
    ^-- PageMemory [pages=200]
    ^-- Heap [used=73MB, free=92.47%, comm=981MB]
    ^-- Non heap [used=53MB, free=96.47%, comm=55MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=6, qSize=0]
    ^-- System thread pool [active=0, idle=8, qSize=0]
[12:59:25,320][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/34.241.7.9, rmtPort=53627]
[12:59:25,320][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/34.241.7.9, rmtPort=53627]
[12:59:25,320][INFO][tcp-disco-sock-reader-#628][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/34.241.7.9:53627, rmtPort=53627]
[12:59:25,325][INFO][tcp-disco-sock-reader-#628][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/34.241.7.9:53627, rmtPort=53627
[12:59:30,332][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/34.241.7.9, rmtPort=50418]
[12:59:30,332][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/34.241.7.9, rmtPort=50418]
[12:59:30,332][INFO][tcp-disco-sock-reader-#629][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/34.241.7.9:50418, rmtPort=50418]
[12:59:30,334][INFO][tcp-disco-sock-reader-#629][TcpDiscoverySpi] Finished 



2nd ignite node logs  

[12:13:12,850][INFO][main][TcpCommunicationSpi] Successfully bound communication NIO server to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0, selectorsCnt=4, selectorSpins=0, pairedConn=false]
[12:13:12,869][WARNING][main][TcpCommunicationSpi] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
[12:13:12,888][WARNING][main][NoopCheckpointSpi] Checkpoints are disabled (to enable configure any GridCheckpointSpi implementation)
[12:13:12,918][WARNING][main][GridCollisionManager] Collision resolution is disabled (all jobs will be activated upon arrival).
[12:13:12,919][INFO][main][IgniteKernal] Security status [authentication=off, tls/ssl=off]
[12:13:13,275][INFO][main][ClientListenerProcessor] Client connector processor has started on TCP port 10800
[12:13:13,328][INFO][main][GridTcpRestProtocol] Command protocol successfully started [name=TCP binary, host=0.0.0.0/0.0.0.0, port=11211]
[12:13:13,369][INFO][main][IgniteKernal] Non-loopback local IPs: 172.17.0.1, 172.18.0.1, 172.31.29.3, fe80:0:0:0:10f0:92ff:fea1:d09f%vethee2519f, fe80:0:0:0:42:19ff:fe73:ee80%docker_gwbridge, fe80:0:0:0:42:e6ff:fe14:144a%docker0, fe80:0:0:0:4b3:6ff:fe01:7ee0%eth0, fe80:0:0:0:64f4:8bff:fe83:7e97%vethdae9948, fe80:0:0:0:9474:a1ff:fe6b:3368%vethcb2500f
[12:13:13,370][INFO][main][IgniteKernal] Enabled local MACs: 02421973EE80, 0242E614144A, 06B306017EE0, 12F092A1D09F, 66F48B837E97, 9674A16B3368
[12:13:13,429][INFO][main][TcpDiscoverySpi] Successfully bound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0, locNodeId=07b55edb-cdb7-45eb-bfd6-36fe9c5f5f15]
[12:13:18,555][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
[12:18:20,925][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
[12:23:22,710][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
[12:28:23,988][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
[12:33:25,004][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
[12:38:25,815][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
[12:43:26,831][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
[12:48:27,916][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]

Solution

  • If you are using same config file for starting 2 nodes, then try to use localPortRange in DiscoverySpi.