Search code examples
javaignitegridgain

Ignite SqlQuery timeout & cancellation - If/when does QueryCancelledException get thrown


As this doc suggests one can set a timeout when executing a SqlQuery by setting, https://ignite.apache.org/releases/2.4.0/javadoc/org/apache/ignite/cache/query/SqlQuery.html#setTimeout-int-java.util.concurrent.TimeUnit-

The doc for QueryCancelledException also mentions that the checked exception is thrown if a query was cancelled or timed out while executing, https://ignite.apache.org/releases/2.4.0/javadoc/org/apache/ignite/cache/query/QueryCancelledException.html

The same is mentioned here as a way to cancel/timeout long running queries, https://apacheignite-sql.readme.io/v2.4/docs/query-cancellation

But strangely the java doc for all of the IgniteCache.query(..) methods, https://ignite.apache.org/releases/2.4.0/javadoc/org/apache/ignite/IgniteCache.html#query-org.apache.ignite.cache.query.Query- does not declare this checked exception or for that matter any checked exception as being thrown (same with QueryCursor.getAll() method) resulting in confusion on where & how to code the handling for query timeouts.

I coded the below but am unable to make the query to time out to test that part of my code path quickly & see if its correct. I am hoping the exception will be thrown both in IgniteCache.query(..) method and in QueryCursor.getAll() & its related methods.

Apparently the minimum timeout granularity for SqlQuery.setTimeout(int timeout, TimeUnit timeUnit) is TimeUnit.MILLISECONDS which i realized during initial testing making it harder to force a timeout for testing.

Does the code below look right? (i want to avoid cursor methods & rely on IgniteCache.query(..) called inside the try-with-resources to detect timeout). Will this work?

@Scheduled(fixedDelayString = "${checkInterval}", initialDelayString = "${checkDelay}")
private final void monitorHealth() {
    if(!isReady) {
        return;
    }
    try (QueryCursor<Entry<Integer, FabricInfo>> cursor = fabricInfoCache.query(SQL_QUERY)) {
        cursor.iterator();
        // Reset the query time out counter..
        if(retryCount != 0) {
            retryCount = 0;
            LOGGER.warn("Client health check query executed without getting timed out before the configured maximum number of timeout retries was reached. Reseting retryCount to zero.");
        }
    } catch (Exception e) {
        if(e.getCause() instanceof QueryCancelledException) {
            retryCount++;
            LOGGER.warn("Client health check query timed out for the {} time.", retryCount);

            if(retryCount > QUERY_MAX_RETRIES_VALUE) {
                // Query timed out the maximum number of times..
                LOGGER.error("Client health check query timed out repeatedly for the maximum number of times configured : {}. Initating a disconnect-reconnect.", retryCount);
                reconnectAction();
            }
        } else {
            if (e.getCause() instanceof IgniteClientDisconnectedException) {
                LOGGER.error("Client health check query failed due to client node getting disconnected from cluster. Initating a disconnect-reconnect.", e.getCause());
            } else {
                // Treat other failures like CacheStoppedException, etc same as IgniteClientDisconnectedException...
                LOGGER.error("Client health check query failed. Initating a disconnect-reconnect.", e.getCause());
            }
            reconnectAction();
        }
    }
}

Thanks Muthu


Solution

  • QueryCancelledException is thrown from methods of QueryCursor, wrapped into IgniteException, which is a subclass of RuntimeException.

    The query is not executed right after you call the IgniteCache#query(...) method. It only happens, when QueryCursor#iterator() method is called.

    You can look, for example, at the following test in Ignite project, which checks, that query cancellation and timeouts are respected: IgniteCacheLocalQueryCancelOrTimeoutSelfTest.