This is the scenario
"org.apache.ignite.logger.java.JavaLogger" "info" "INFO" "" "284" "Accepted incoming communication connection [locAddr=/0:0:0:0:0:0:0:1:47101, rmtAddr=/0:0:0:0:0:0:0:1:62856]" "" "" "" "" "" "" "1587013526124" "" "" "" "" "" "" "ROOT" "{""service"":"""",""logger_name"":""org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi""}"
[10:37:57] Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_CRITICAL_OPERATION_TIMEOUT, err=class o.a.i.IgniteException: Checkpoint read lock acquisition has been timed out.]] [10:37:57,739][SEVERE][exchange-worker-#46][GridCacheDatabaseSharedManager] Checkpoint read lock acquisition has been timed out. class org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$CheckpointReadLockTimeoutException: Checkpoint read lock acquisition has been timed out. at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.failCheckpointReadLock(GridCacheDatabaseSharedManager.java:1708) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1640) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initTopologies(GridDhtPartitionsExchangeFuture.java:1078) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:944) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3258) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3104) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at java.lang.Thread.run(Thread.java:748) [10:39:21,547][SEVERE][tcp-disco-msg-worker-[693d29cd 0:0:0:0:0:0:0:1%lo0:47501 crd]-#2][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#59, blockedFor=209s]
I am assuming here that since I am force shutting down the Java application which starts Ignite Client node, it's possible that there would be some topology imbalance that might happen.
Can someone please suggest, if at all I force kill the Client application, is there a correct way to restart the Client application such that it'll continue re-establishing the connection with Ignite server and continue working?
This scenario is possible when you have very long timeouts.
You should not expect node to be dropped, and a new one to join, before all timeout runs off, such as, network timeout, socket write timeout, failure detection timeout. That, unless you do graceful shutdown.