Ignite version v2.8.1-1
I have configured RestartProcessFailureHandler for handling the system critical errors like SYSTEM_WORKER_BLOCKED, however, when the error occurs, the restart never happens even after hours, is this expected behavior?
However, do see in the logs that indicating a restart has been requested but it seems never got executed.
As an alternative, I am thinking of enabling the rest API for a liveness check of the service and restarting the service once the check fails if the failure handler is not suitable for handling this case, please advise.
Thanks.
[2022-03-08T02:14:32,561][ERROR][disco-event-worker-#44%ignite-instance%][] Critical system error detected. Will be handled accordingly to configured handler [hnd=RestartProcessFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=Unmod ifiableSet []]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=sys-stripe-6, igniteInstanceName=ignite-instance, finished=false, heartbeatTs=1646705660633]]] org.apache.ignite.IgniteException: GridWorker [name=sys-stripe-6, igniteInstanceName=ignite-instance, finished=false, heartbeatTs=1646705660633] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1810) [ignite-core-2.8.1.jar:2.8.1] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1805) [ignite-core-2.8.1.jar:2.8.1] at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234) [ignite-core-2.8.1.jar:2.8.1] at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) [ignite-core-2.8.1.jar:2.8.1] at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2796) [ignite-core-2.8.1.jar:2.8.1] at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.8.1.jar:2.8.1] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312] ...
[2022-03-08T02:14:32,603][ERROR][node-restarter][] Restarting JVM on Ignite failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=sys-stripe-6, igniteInstanceName=ignite-instance, finished=false, heartbeatTs=1646705660633]]] ....
There is no standard way for a JVM to restart itself from Java application and therefore ignite rely on external tools to provide that capability. According to docs for org.apache.ignite.failure.RestartProcessFailureHandler
https://ignite.apache.org/docs/2.11.1/perf-and-troubleshooting/handling-exceptions#failures-handling
standard ignite.sh|bat
scripts support restarting when JVM process exits with this code.
If you run ignite as part of your application, you can write your own script to start. And add IGNITE_SUCCESS_FILE=<path to marker file, which will be created by ignite during restart is called>
as jvm option at start java process. After this failure handler works, jvm exits with org.apache.ignite.IgniteSystemProperties#IGNITE_RESTART_CODE
, then you need to check that IGNITE_SUCCESS_FILE
was created, remove it and start jvm again.