Search code examples
jsond3.jsgremlinamazon-neptunegremlinpython

Gremlin: Find all the paths between two nodes and transform the query result into JSON format


I'm writing Gremlin python queries on a Neptune database

I want to find all the paths between a node A and a node B.

Then I would like to write the data into a JSON format that looks like this:

{ "nodes": [
    { "id": 1, "name": "A", "color":"red"},
    { "id": 2, "name": "B", "color":"green"},
    { "id": 3, "name": "C", "color":"green"}
  ],
  "links": [
    { "source": 1, "target": 2, "color":"blue" }
    { "source": 1, "target": 3, "color":"purple" }
    { "source": 3, "target": 2, "color":"blue" }
  ]}

So that it's compatible with d3.js graph library and I could load the result into the d3 graphing library.

(In this case between A and B there would be the paths A->B and A->C->B)

I think I could use GraphSONWriter for this? Is that right?


Solution

  • Various Gremlin steps can yield a result that is deserialized as a dict in Python (essentially JSON). The steps to look at include group, project, elementMap and valueMap - Gremlin itself only yields raw JSON if you call the HTTP endpoint (not the recommended way).

    To achieve a result such as the one above would likely use elementMap combined with project or union perhaps.

    When I am working with libraries like D3.js or Vis.js (that use JSON to build their visual models) I typically return the result from Gremlin as close as I can get it to the final form (within reason) and then in my code create the final JSON. A common flow that I use is this one.

    (1) Web client --> (2) API Gateway --> (3) Lambda --> (4) Neptune
    

    The steps in that flow are as follows

    1. Javascript calls the REST API, gets back a JSON result and creates the visualization.
    2. REST API endpoint that routes calls to Lambda functions
    3. Lambda functions (written in Gremlin Python) that call into Neptune and then translate the results into JSON before sending back to the client.
    4. Neptune endpoint - runs the Gremlin queries