Search code examples
.netiisignite

Apache Ignite.NET server node hangs on Start()


So I have configured my Apache Ignite.NET instance to run as server:

var cfg = new IgniteConfiguration
        {
            CommunicationSpi = new TcpCommunicationSpi
            {
                LocalPort = config.CommunicationPort,
                LocalPortRange = config.CommunicationPortRange,
                MaxConnectTimeout = TimeSpan.FromMilliseconds(10000),
                ConnectTimeout = TimeSpan.FromMilliseconds(1000)
            },
            AutoGenerateIgniteInstanceName = true,
            ClientMode = false,
            IsActiveOnStart = true,
            DiscoverySpi = new TcpDiscoverySpi
            {
                LocalPort = config.DiscoveryPort,
                LocalPortRange = config.DiscoveryPortRange,
                ForceServerMode = true,
                LocalAddress = localAddress,
                IpFinder = new TcpDiscoveryStaticIpFinder
                {
                    Endpoints = config.ClusterEndPoints
                }
            },
            Localhost = config.LocalAddress,
        };

I use the ForceServerMode = true and in the DiscoverySpi.Endpoints I have my local ip along with a list of IP of my cluster.

What I'm seeing is that for some reason the Join calls by ignite timeout. Here's the exception log I get:

Level: [Error], Message:[Exception on direct send: connect timed out] Native:[java.net.SocketTimeoutException: connect timed out
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.openSocket(TcpDiscoverySpi.java:1376)
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.openSocket(TcpDiscoverySpi.java:1339)
at org.apache.ignite.spi.discovery.tcp.ServerImpl.sendMessageDirectly(ServerImpl.java:1159)
at org.apache.ignite.spi.discovery.tcp.ServerImpl.sendJoinRequestMessage(ServerImpl.java:1006)
at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:851)
at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:358)
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1834)
at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:837)
at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1770)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:977)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1896)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1648)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1076)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:574)
at org.apache.ignite.internal.processors.platform.PlatformAbstractBootstrap.start(PlatformAbstractBootstrap.java:48)
at org.apache.ignite.internal.processors.platform.PlatformIgnition.start(PlatformIgnition.java:76)

]

So that's fine, maybe there is some network issue, partitioning, firewall etc.. I can figure that out.

What I don't understand is why does the call to start ingite node hang. I expect it to try to connect to those endpoints and if not able to, it should just start local node. Here's how I start my node

Ignition.Start(cfg);

Instead what I see is that it keeps trying to join those timeout logs are written, and it never stops and the application hangs indefinitely.

I am missing some configuration to make Ignite give up trying to connect and just start local mode, or just fail altogether.

[Edit] This only happens when I already have other apps with ignite running in a cluster and this new node tries to join the existing cluster via static ips (and it's VM has a bad network config which prevents it from talking to the existing cluster). If I try to start this new node and there are no ignite instances already running, it does NOT hang, it just goes ahead and starts local ignite node.


Solution

  • I ended up writing a quick blog post about all the Apache Ignite.NET experimentation I did which answers the question above