amazon-web-services aws-lambda amazon-vpc amazon-neptune

Problems connecting to Neptune from Lambda

I have created a simple AWS Neptune cluster, with a writer and no read replicas. I used the option to create a new VPC for it, and two security groups were automatically created for it, too.

I also have a Lambda that calls that Nepture cluster's endpoint. I have configured the Lambda with the Neptune cluster's VPC, specifying all of its subnets and the two security groups mentioned above. I didn't manually modified the inbound and outbound rules once they have been automatically assigned upon me performing the VPC configuration from the AWS Console (just going through the steps).

The Lambda is written in Python and uses the requests library to make HTTPS calls, with AWS Singature V4. The execution role for the Lambda has NeptuneFullAccess and an inline policy to allow configuring a VPC for the Lambda (which has been done, so that policy works).

The Lambda calls the Neptune cluster's endpoint, with the cluster's name and ID redacted, on port 8182:

https://NAME.cluster-ID.us-east-1.neptune.amazonaws.com:8182

I get the following error:

{
  "errorMessage": "2020-05-20T21:26:35.066Z c8ee70ac-6390-48fd-a32e-36f80d58a24e Task timed out after 3.00 seconds"
}

What am I doing wrong?

UPDATE: So, it looks like the second security group for the Neptune cluster was created by me selecting an option when creating the cluster. So, I tried again with Choose existing option for the security group, instead of Create new. (I guess I was confused before, because I was creating a whole new VPC, so how could a security group already exist? But the wizard just assumes the default security group that would be created by then.)

Now, I no longer get the same error. However, what I see is this:

{
  "errorType": "Runtime.ExitError",
  "errorMessage": "RequestId: 48e3b4fb-1b88-48d3-8834-247dbb1a4f3f Error: Runtime exited without providing a reason"
}

The log shows this:

{
  "requestId": "b8b91c18-34cd-c5f6-9103-ed3357b9241e",
  "code": "BadRequestException",
  "detailedMessage": "Bad request."
}

The query was (given the Lambda code described in https://docs.amazonaws.cn/en_us/neptune/latest/userguide/iam-auth-connecting-python.html):

{
  "host": "NAME.cluster-ID.us-east-1.neptune.amazonaws.com:8182",
  "method": "GET",
  "query_type": "status",
  "query": ""
}

Any suggestions?

UPDATE: Trying against another Neptune cluster, the [Errno 111] Connection refused' error comes back. I have noticed an odd thing, however: I have some orphaned network interfaces, from when the Lambda was associated with the VPCs of now-deleted Neptune clusters. The network interfaces are marked in use, however, and I cannot detach and delete them, not even with the Force detachment option. Getting the You are not allowed to manage 'ela-attach' attachments error.

UPDATE: Starting with a fresh Lambda (no redoing its VPC configuration, and so no orphaned network interfaces anymore) and a fresh Neptune cluster with IAM Auth enabled and configured (and even with the Lambda's execution role given full admin access for the purposes of debugging, to eliminate any missing permissions), still getting this error:

{
  "errorMessage": "HTTPSConnectionPool(host='NAME.cluster-ID.us-east-1.neptune.amazonaws.com', port=8182): Max retries exceeded with url: /status/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f1f9f98c310>: Failed to establish a new connection: [Errno 111] Connection refused'))",
  "errorType": "ConnectionError",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 71, in lambda_handler\n    return make_signed_request(host, method, query_type, query)\n",
    "  File \"/var/task/lambda_function.py\", line 264, in make_signed_request\n    r = requests.get(request_url, headers=headers, verify=False, params=request_parameters)\n",
    "  File \"/var/task/requests/api.py\", line 76, in get\n    return request('get', url, params=params, **kwargs)\n",
    "  File \"/var/task/requests/api.py\", line 61, in request\n    return session.request(method=method, url=url, **kwargs)\n",
    "  File \"/var/task/requests/sessions.py\", line 530, in request\n    resp = self.send(prep, **send_kwargs)\n",
    "  File \"/var/task/requests/sessions.py\", line 643, in send\n    r = adapter.send(request, **kwargs)\n",
    "  File \"/var/task/requests/adapters.py\", line 516, in send\n    raise ConnectionError(e, request=request)\n"
  ]
}

Solution

Thanks to the help of the Neptune team (an amazing response! they called me to discuss this), I was able to figure this out.

First, the Connection refused error disappeared once I redid the setup with a fresh Neptune cluster and the Use existing option for the security group, as well as a brand new Lambda added to the Neptune cluster's VPC. Apparently, redoing VPC configuration on a Lambda sometimes leaves orphaned network interfaces that are hard to delete. So, do the VPC config on a Lambda only once!

Second, the runtime error that started showing up after that is due to a bug in the Python code provided by AWS here: https://docs.aws.amazon.com/neptune/latest/userguide/iam-auth-connecting-python.html

Namely, the make_signed_request function in that script doesn't return a value. It should return r.text or, better yet, json.loads(r.text). Then, everything works just fine.