Search code examples
javaignite

Apache Ignite: Node has not been connected to topology


In a local (test) setup with two nodes on the same machine (using static IP configuration with port range 47500..47501), the 'second' node won't join the cluster; it issues a TcpDiscoveryJoinRequestMessage that seems to be answered by the 'first' node, yet after the network timeout occurs (20s), it shows a "Node has not been connected to topology" message and keeps sending discovery join messages that are subsequently ignored by the first node ("Ignoring join request message since node is already in topology").

The same applies to a ('real') cluster setup on (both bare metal am VM) docker machines.

Is this a known issue? Any advice on where / what to look for? Ignite issues tons of logs (TcpDiscoverySpi), but I can't see any error or warning that might explain the behaviour. Static IP configuration and customized network timeout are in effect.

Configuration is given as yml to build up a configuration bean (Spring Boot application) that in turn constructs the actual Ignite config.

grid:
  discovery:
    network-timeout: 20000
    join-timeout: 20000
    static:
      enabled: true
      addresses: 127.0.0.1:47500..47501

TcpDiscoveryVmIpFinder is in effect (as seen in the logs).

See also the relevant sections from the node logs (TcpDiscoverySpi).


Solution

  • As far as I can see, you use Ignite messaging, and some of your remoteListeners contain an IgniteSemaphore as its field, or as a part of its closure. Information about this listener is sent to all nodes in discovery messages, when they connect.

    When remoteListener is deserialised, a semaphore is requested from the DataStructuresProcessor. But it hasn't been initialised yet, since node join hasn't finished. This is a deadlock, because a node cannot join until the DataStructuresProcessor is initialised and vise versa.

    You can avoid this problem by initialising the semaphore lazily:

    public static class ListenerHandler implements IgniteBiPredicate<UUID, Object> {
        @IgniteInstanceResource
        private Ignite ignite;
    
        private transient IgniteSemaphore sem;
    
        private IgniteSemaphore semaphore() {
            if (sem != null)
                return sem;
    
            sem = ignite.semaphore("sem", 1, true, true);
            return sem;
        }
    
        @Override public boolean apply(UUID uuid, Object o) {
            // ...
        }
    }
    

    Related issue on the bug tracker: IGNITE-3089