Search code examples
amazon-web-serviceswebsocketaws-lambdagremlin-serveramazon-neptune

How to make AWS Lambda work reliably with Neptune over websockets?


So I've built this API. It consists of a Lambda function (accessible via API Gateway) which talks to a Neptune graph database instance via websockets.

Everything is wired up and working. But I recently started noticing intermittent 500's coming from the API. After some investigation I found that the Neptune Gremlin server was dropping/refusing connections whenever multiple requests would come in close together.

I found this page which suggests that the ephemeral nature of serverless doesn't play nice with websockets, so the websocket connection should be closed manually after each request. But after implementing that I found no difference – still 500's.

The page also suggests that when using Gremlin on Neptune you should probably send HTTP requests to Neptune rather than using websockets,

Alternatively, if you are using Gremlin, consider submitting requests to the Gremlin HTTP REST endpoint rather than the WebSockets endpoint, thereby avoiding the need to create and manage the lifetime of a connection pool.

The downside to this approach is that we would then have to use string-based queries (which means re-writing a large portion of the project). Another downside is that the Gremlin HTTP endpoint returns pretty unstructured data.

So what I'm wondering is whether anyone has got Lambda reliably talking to Neptune over websockets? If so, how?

Edit:

Since I'm using the AWS Chalice framework I don't think I really have direct access to the handler function. Below is what my lambda looks like.

enter image description here

And here is the code that connect() is calling:

import os

from gremlin_python.structure.graph import Graph
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection


def connect():
    conn_string = os.environ.get('GRAPH_DB')
    global g
    g = Graph().traversal().withRemote(DriverRemoteConnection(conn_string, 'g'))

So when the app starts (when a lambda instance is spun up), that connect function is called and the app gets a connection to Neptune. From there the app passes around that global g variable so as to use the same connection instance for that invocation. I was then calling close() on the DriverRemoteConnection object before returning the results of a request (and that's where I found I was still getting 500's).


Solution

  • Yes, it is possible to use WebSockets within a Lambda function to communicate with Neptune. There are different nuances for doing this depending on the programming language that you're using. Ultimately, it comes down instantiating the client connection and closing the connection within the handler() of the Lambda function.

    If using Java [1], you can create the cluster object outside of the handler so that it can be reused per each Lambda invocation. But the client that is configured from that cluster object must be instantiated and closed during each invocation.

    Do you have a snippet of code that you're using that you could share for review?

    [1] https://docs.aws.amazon.com/neptune/latest/userguide/best-practices-gremlin-java-close-connections.html