Search code examples
amazon-web-servicesaws-lambdaaws-sdkaws-java-sdk

Why is aws lambda invocation client incorrectly returning ClientExecutionTimeoutException?


We seem to be deterministically encountering this problem and aren't sure where we're misconfigured. For lambdas running less than ~5 minutes, our invocation succesfully wraps up ~0.5 seconds after the lambda completes. However for anything running longer than that, we can see that the lambda completes in the lambda logs, but our client invocation throws a ClientExecutionTimeoutException after 15 minutes.

After encountering the problem with other (otherwise successful) lambdas, we created a basic test lambda on Node with a sleep function and have been able to deterministically reproduce the issue:

function sleep(s) {
  return new Promise(resolve => setTimeout(resolve, s * 1000));
}
const sleepMinutes = 60 * 5;
exports.handler = async (event) => {
    console.log(`received lambda invocation, sleeping ${sleepMinutes}`);
    const response = {
        statusCode: 200,
        body: JSON.stringify(`finished running, slept for ${sleepMinutes} minutes`),
    };
    await sleep(sleepMinutes);
    console.log('finished sleeping');
    return response;
};

Our lambda invocation client is using these client configs:

clientConfig.setRetryPolicy(PredefinedRetryPolicies.NO_RETRY_POLICY);
clientConfig.setMaxErrorRetry(0);
clientConfig.setSocketTimeout(15 * 60 * 1000);
clientConfig.setRequestTimeout(15 * 60 * 1000);
clientConfig.setClientExecutionTimeout(15 * 60 * 1000);

Is there a ~5 minute timeout config we're missing?


Solution

  • Javadocs in aws-sdk-java says:

     For functions with a long timeout, your client might be disconnected during synchronous invocation while it waits for a response. Configure your HTTP client, SDK, firewall, proxy, or operating system to allow for long connections with timeout or keep-alive settings.
    

    On the other hand, previously AWS Lambda was limited up to 5 minutes, later this limit was increased up to 15 minutes.

    I would check:

    1. The client sdk version is up to date
    2. The connection is not closed by your network
    3. Move to an async invocation via AWSLambdaAsyncClient.invokeAsync() for long running invocations.