I'm a n00b to AWS.
I have a Lambda written in Java that processes S3 events from an SQS queue. The events are triggered by the creation of files in a specified directory in the S3 bucket.
The Lambda's processing of single S3 event received from the queue (i.e. creating one file) works as expected.
If I create a batch of between 5 and 10 files at the same time, multiple instances of the Lambda - usually between 3 and 5 in number - are initiated to process the resulting events. Some will work without issue but at least one of these (and some times more than one) will fail. The behaviour is (somewhat frustratingly) inconsistent.
During the execution of a Lambda that fails, the first error occurs when it tries to connect to the AWS Secrets Manager:
com.amazonaws.http.conn.ssl.SdkTLSSocketFactory - connecting to secretsmanager.ap-southeast-2.amazonaws.com/<ip>:<port>
c.a.http.conn.ClientConnectionManagerFactory - java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
... stack trace...
Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to secretsmanager.ap-southeast-2.amazonaws.com:<port> [secretsmanager.ap-southeast-2.amazonaws.com/<ip>, secretsmanager.ap-southeast-2.amazonaws.com/<ip>, secretsmanager.ap-southeast-2.amazonaws.com/<ip>] failed: connect timed out
... stack trace...
Caused by: java.net.SocketTimeoutException: connect timed out
The connection is retried a couple of further times by the Lambda but always fails. The Lambda code catches the exception and tries to do some cleaning up but then also cannot connect to the S3 bucket:
com.amazonaws.http.conn.ssl.SdkTLSSocketFactory - Connecting socket to <s3 bucket>.s3.ap-southeast-2.amazonaws.com/<ip>:<port> with timeout 10000
c.a.http.conn.ClientConnectionManagerFactory - java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
... stack trace...
Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to <s3 bucket>.s3.ap-southeast-2.amazonaws.com:<port> [<s3 bucket>.s3.ap-southeast-2.amazonaws.com/<ip>] failed: connect timed out
... stack trace...
Caused by: java.net.SocketTimeoutException: connect timed out
As this behaviour is inconsistent, I am not sure of an approach to identifying what the issue is - I can't work out why some instances of the Lambda would fail completely when others running at the same time work without any problems.
I am using the following libraries from com.amazonaws in my Java project:
aws-lambda-java-core: 1.2.0
aws-java-sdk-s3: 1.11.714
aws-java-sdk-events: 1.11.714
aws-java-sdk-secretsmanager: 1.11.718
aws-java-sdk-sqs: 1.11.719
Thanks in advance for any assistance.
The issue was a networking one - one of the private subnets that the Lambda's VPC uses had a mis-configured route table that was assigned to a non-existent NAT gateway.
Once the correct NAT gateway was added, the Lambda worked as expected.
Many thanks to John Rotenstein for his help with diagnosing this issue.