We're facing an issue with latest version of Ignite (V2.15.0). We're using three nodes and one node is going down everyday due to below issues.
clear()
to removeAll()
however as such we do not have any reference to data streamer in our code. Below are the exception logs.Stopping local node on Ignite failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-8, igniteInstanceName=CasperIgniteCluster, finished=false, heartbeatTs=1693369119278]]]
2023-08-31 00:18:39.865 ERROR 8 --- [rIgniteCluster%] ROOT : Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-19, igniteInstanceName=CasperIgniteCluster, finished=false, heartbeatTs=1693455490076]]] org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-19, igniteInstanceName=CasperIgniteCluster, finished=false, heartbeatTs=1693455490076]
at sun.nio.ch.Net.poll(Native Method) ~[na:1.8.0_322]
at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:953 undefined) ~[na:1.8.0_322]
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:121 undefined) ~[na:1.8.0_322]
at org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createNioSession(GridNioServerWrapper.java:462 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createTcpClient(GridNioServerWrapper.java:693 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1181 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$Lambda$1772/1684778436.apply(Unknown Source) ~[na:na]
at org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createTcpClient(GridNioServerWrapper.java:691 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.spi.communication.tcp.internal.ConnectionClientPool.createCommunicationClient(ConnectionClientPool.java:442 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.spi.communication.tcp.internal.ConnectionClientPool.reserveClient(ConnectionClientPool.java:231 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1105 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1052 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2102 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2195 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1279 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1318 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicAbstractUpdateFuture.sendDhtRequests(GridDhtAtomicAbstractUpdateFuture.java:476 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicAbstractUpdateFuture.map(GridDhtAtomicAbstractUpdateFuture.java:433 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1920 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1688 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:300 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicUpdateFuture.map(GridNearAtomicUpdateFuture.java:812 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicUpdateFuture.mapOnTopology(GridNearAtomicUpdateFuture.java:664 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:249 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.removeAllAsync0(GridDhtAtomicCache.java:1356 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.removeAll0(GridDhtAtomicCache.java:703 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.removeAll(GridCacheAdapter.java:3186 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearAtomicCache.removeAll(GridNearAtomicCache.java:549 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.removeAll(IgniteCacheProxyImpl.java:1585 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.removeAll(GatewayProtectedCacheProxy.java:1106 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.datastreamer.DataStreamerCacheUpdaters.updateAll(DataStreamerCacheUpdaters.java:94 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.datastreamer.DataStreamerCacheUpdaters$Batched.receive(DataStreamerCacheUpdaters.java:163 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:144 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7431 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:789 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.security.thread.SecurityAwareRunnable.run(SecurityAwareRunnable.java:51 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:637 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at java.lang.Thread.run(Thread.java:750 undefined) ~[na:1.8.0_322]
2023-08-29 03:52:05.423 ERROR 8 --- [rIgniteCluster%] ROOT : Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=sys-stripe-24, igniteInstanceName=CasperIgniteCluster, finished=false, heartbeatTs=1693295501788]]] org.apache.ignite.IgniteException: GridWorker [name=sys-stripe-24, igniteInstanceName=CasperIgniteCluster, finished=false, heartbeatTs=1693295501788]
at sun.nio.ch.Net.poll(Native Method) ~[na:1.8.0_322]
at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:953 undefined) ~[na:1.8.0_322]
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:121 undefined) ~[na:1.8.0_322]
at org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createNioSession(GridNioServerWrapper.java:462 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createTcpClient(GridNioServerWrapper.java:693 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1181 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$Lambda$1772/2003383653.apply(Unknown Source) ~[na:na]
at org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createTcpClient(GridNioServerWrapper.java:691 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.spi.communication.tcp.internal.ConnectionClientPool.createCommunicationClient(ConnectionClientPool.java:442 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.spi.communication.tcp.internal.ConnectionClientPool.reserveClient(ConnectionClientPool.java:231 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1105 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1052 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2102 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2195 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1279 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1318 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicAbstractUpdateFuture.sendDhtRequests(GridDhtAtomicAbstractUpdateFuture.java:476 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicAbstractUpdateFuture.map(GridDhtAtomicAbstractUpdateFuture.java:433 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1920 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1688 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3179 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$200(GridDhtAtomicCache.java:147 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$3.apply(GridDhtAtomicCache.java:270 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$3.apply(GridDhtAtomicCache.java:265 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1164 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:605 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:406 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:324 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:112 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:314 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:243 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:637 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125 undefined) ~[ignite-core-2.15.0.jar!/:2.15.0]
at java.lang.Thread.run(Thread.java:750 undefined) ~[na:1.8.0_322]
Ignite upgrade from v2.11.0 to V2.15.0 also replaced clear()
to removeAll()
since clear was throwing an error.
As far as I can see you've intentionally configured your nodes to behave like this. According to the log snippet you seem to have configured a custom failure handler.
[hnd=StopNodeFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]]
My bet is that you've overwritten the default handler with
StopNodeFailureHandler
and ignoredFailureTypes set as an empty collection.
In general you should have extremely solid thoughts and reasons to modify the default set of ignored exceptions. This list was introduced deliberately and contains exceptions that could just indicate potential and/or ongoing stability issues. It does not necessarily mean that a node should be terminated immediately.
You need to review your configuration and fix it if this piece wasn't introduced intentionally. Refer to the docs. Here's the snippet of this configuration.
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="failureHandler">
<bean class="org.apache.ignite.failure.StopNodeFailureHandler"/>
</property>
</bean>