Search code examples
javanetwork-programmingignite

Establishing Client Connection to Ignite Cluster Causes OutOfMemoryError on Server


I am running into an interesting error trying to connect an Ignite client to a cluster.

When I connect using the set up below I get the following errors on the client and server:

Client side logs:

24-Feb-2021 15:18:31.135 WARNING [tcp-client-disco-msg-worker-#4%igniteClientInstance%-#39%igniteClientInstance%] org.apache.ignite.logger.java.JavaLogger.warning Timed out waiting for message to be read (most probably, the reason is long GC pauses on remote node) [curTimeout=1000, rmtAddr=/XXX.XXX.XXX.XXX:yyyy, rmtPort=yyyy]
24-Feb-2021 15:18:31.137 WARNING [tcp-client-disco-msg-worker-#4%igniteClientInstance%-#39%igniteClientInstance%] org.apache.ignite.logger.java.JavaLogger.warning Failed to connect ...skipping...


And this server side log:

[14:58:09,536][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 1037 milliseconds.
[14:58:10,536][SEVERE][grid-nio-worker-client-listener-0-#31][ClientListenerProcessor] Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=4 lim=439 cap=8192], super=AbstractNioClientWorker [idx=0, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-client-listener-0, igniteInstanceName=null, finished=false, heartbeatTs=1614178688450, hashCode=645844509, interrupted=false, runner=grid-nio-worker-client-listener-0-#31]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, closeSocket=true, outboundMessagesQueueSizeMetric=null, super=GridNioSessionImpl [locAddr=/10.0.1.81:10800, rmtAddr=/10.0.0.229:37584, createTime=1614178688450, closeTime=0, bytesSent=0, bytesRcvd=439, bytesSent0=0, bytesRcvd0=439, sndSchedTime=1614178688450, lastSndTime=1614178688450, lastRcvTime=1614178688450, readsPaused=false, filterChain=FilterChain[filters=[GridNioAsyncNotifyFilter, GridNioCodecFilter [parser=ClientListenerBufferedParser, directMode=false]], accepted=true, markedForClose=false]]]
java.lang.OutOfMemoryError: Java heap space
        at org.apache.ignite.internal.processors.odbc.ClientListenerNioServerBuffer.read(ClientListenerNioServerBuffer.java:81)
        at org.apache.ignite.internal.processors.odbc.ClientListenerBufferedParser.decode(ClientListenerBufferedParser.java:57)
        at org.apache.ignite.internal.processors.odbc.ClientListenerBufferedParser.decode(ClientListenerBufferedParser.java:39)
        at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onMessageReceived(GridNioCodecFilter.java:113)
        at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
        at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onMessageReceived(GridNioServer.java:3704)
        at org.apache.ignite.internal.util.nio.GridNioFilterChain.onMessageReceived(GridNioFilterChain.java:175)
        at org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:1192)
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2478)
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2243)
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1880)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)
[14:58:10,539][SEVERE][grid-nio-worker-client-listener-0-#31][ClientListenerProcessor] Closing NIO session because of unhandled exception.
class org.apache.ignite.internal.util.nio.GridNioException: Java heap space
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2504)
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2243)
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1880)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at org.apache.ignite.internal.processors.odbc.ClientListenerNioServerBuffer.read(ClientListenerNioServerBuffer.java:81)
        at org.apache.ignite.internal.processors.odbc.ClientListenerBufferedParser.decode(ClientListenerBufferedParser.java:57)
        at org.apache.ignite.internal.processors.odbc.ClientListenerBufferedParser.decode(ClientListenerBufferedParser.java:39)
        at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onMessageReceived(GridNioCodecFilter.java:113)
        at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
        at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onMessageReceived(GridNioServer.java:3704)
        at org.apache.ignite.internal.util.nio.GridNioFilterChain.onMessageReceived(GridNioFilterChain.java:175)
        at org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:1192)
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2478)
        ... 4 more
[14:58:12,523][WARNING][grid-timeout-worker-#22][ClientListenerNioListener] Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.0.0.229:44098]

Client and server on the same network but separate machines. Also the server is running within Kubernetes.

If I specify a thin client however I am able to connect to the Ignite server and execute queries without issue.

Java thin client code:

ClientConfiguration cCfg = new ClientConfiguration();
cCfg.setAddresses("XXX.XXX.XXX.XXX:yyyy");
IgniteClient igniteC =Ignition.startClient(cCfg);

Java Thick Client Code:

IgniteConfiguration cfg =(IgniteConfiguration)fsxmlac.getBean("igniteClient.cfg");
ignite= Ignition.start(cfg);

IgniteConfiguration XML:

<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="
    http://www.springframework.org/schema/beans
    http://www.springframework.org/schema/beans/spring-beans.xsd">
    <!--
        Alter configuration below as needed.
    -->

    <bean class="org.apache.ignite.configuration.IgniteConfiguration" id="igniteClient.cfg">
        <property name="workDirectory" value="/ignite/work"/>
        <property name="clientMode" value="true" />
        <property name="dataStorageConfiguration" ref = "dataStorageConfiguration" />   
        <!--<property name="classLoader" ref="classLoader" /> -->
        <property name="igniteInstanceName" value="igniteClientInstance" />
        <property name="peerClassLoadingEnabled" value="false" />
        <property name="metricsLogFrequency" value="1000000" />
        <property name="communicationSpi" ref="communicationSpi" />
        <property name="discoverySpi" ref="discoverySpi" />
        <property name="cacheConfiguration">
                <bean class="org.apache.ignite.configuration.CacheConfiguration">
                    <property name="name" value="session-cache"/>
                    <property name="cacheMode" value="PARTITIONED"/>
                        <property name="backups" value="1"/>
                            <!--
                            <property name="evictionPolicy">
                            <bean class="org.apache.ignite.cache.eviction.lru.LruEvictionPolicy">
                                <property name="maxSize" value="150000"/>
                            </bean>
                            </property>
                            -->

                </bean>
        </property>             
    </bean>

    <bean id="dataStorageConfiguration" class="org.apache.ignite.configuration.DataStorageConfiguration">
        <property name="defaultDataRegionConfiguration">
            <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
                <property name="persistenceEnabled" value="true"/>
            </bean>
        </property>
        <property name="walPath" value="/ignite/wal"/>
        <property name="walArchivePath" value="/ignite/walarchive"/>
    </bean>
    
    <bean id="communicationSpi" class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
        <property name="slowClientQueueLimit" value="1000" />
        <property name ="localPort" value="32609" />
    </bean>

    <bean id="discoverySpi" class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
        <property name="ackTimeout" value="1000"/>
        <property name="socketTimeout" value="2000"/>
        <property name="ipFinder" ref="ipFinder" />
    </bean>

    <bean id="ipFinder" class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
        <property name="shared" value="false" />
        <property name="addresses">
        <list>
            <value>XXX.XXX.XXX.XXX:yyyy</value>
        </list>
    </property>
    </bean>

    <!--<bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration"/> 
    -->
</beans>

Solution

  • For anyone that stumbles onto this, in the end what I was trying to do was impossible. My Tomcat application is outside Kubernetes while my Ignite server is inside.

    According to the link below this configuration prevents the use of a thick client. As the thick client starts up it will attempt to establish communication with all the other Ignite servers in the cluster but the load balancer of Kubernetes will get in the way of this.

    This is why the discovery spi was able to establish communication but the communication spi failed.

    Ignite Kubernetes sets ups.