Search code examples
aerospike

aerospike connect timeout works incorrectly?


I'm using aerospike java client v 6.0.1 and use the following configs from client read policy:

        clientPolicy.readPolicyDefault.connectTimeout = 1000;
        clientPolicy.readPolicyDefault.socketTimeout = 30;
        clientPolicy.readPolicyDefault.totalTimeout = 110;
        clientPolicy.readPolicyDefault.maxRetries = 2;
        clientPolicy.readPolicyDefault.sleepBetweenRetries = 0;

but I'm getting the following errors from time to time, which say that not all retries were used and timeout occurred:

org.springframework.dao.QueryTimeoutException: Client timeout: iteration=0 connect=1000 socket=30 total=110 maxRetries=2 node=null inDoubt=false; nested exception is com.aerospike.client.AerospikeException$Timeout: Client timeout: iteration=0 connect=1000 socket=30 total=110 maxRetries=2 node=null inDoubt=false


org.springframework.dao.QueryTimeoutException: Client timeout: iteration=1 connect=1000 socket=30 total=110 maxRetries=2 node=A2 node_ip 3000 inDoubt=false; nested exception is com.aerospike.client.AerospikeException$Timeout: Client timeout: iteration=1 connect=1000 socket=30 total=110 maxRetries=2 node=A2 node_ip 3000 inDoubt=false

Does it mean that total operation timeout also involves connect to Aerospike node? Aerospike docs state that total timeout starts after connect timeout finishes: If connectTimeout is greater than zero, it will be applied to creating a connection plus optional user authentication and TLS handshake. When the connect completes, socketTimeout/totalTimeout is then applied. In this case, totalTimeout starts after the connection completes. see https://discuss.aerospike.com/t/understanding-timeout-and-retry-policies/2852

99% of all my requests to aerospike take less than 20 ms and it doesn't make sense for me to increate total timeout.

Originally I had 200-300 ms connect timeout and I increased it to 1000 ms, but it didn't help much


Solution

  • Transactions can sometimes timeout before the transaction has started. For example, async transactions can be throttled and can exist in the delay queue for longer than totalTimeout. If this occurs, a timeout exception is generated with iteration=0.

    Anytime totalTimeout is reached, the transaction is cancelled regardless of the number of retries.

    If connectTimeout is used and a new connection is required (no available connections in the pool) for the transaction, the connectTimeout is applied to connection creation and the totalTimeout stopwatch does not start until the new connection is created.

    If connectTimeout is used and an existing connection is available from the pool, the connectTimeout is not applicable and the totalTimeout stopwatch starts from the beginning of the transaction.

    Since most transactions are able to obtain connections from the pool, it's not surprising that increasing connectTimeout has little effect.