Search code examples
gremlinamazon-neptunegremlinpython

How to calculate the PageRank and shortest path algorithm with gremlin in Amazon Neptune?


Is there any way to calculate PageRank and Shortest Path algorithm with gremlin in Amazon Neptune? As it said in gremlin documentation PageRank centrality can be calculated with Gremlin with the pageRank()-step which is designed to work with GraphComputer (OLAP) based traversals.

I have try to create a traversal with gremlinpython through this code: g = graph.traversal().withComputer().withRemote(remoteConn) but I got this error: GremlinServerError: 499: {"code":"UnsupportedOperationException","requestId":"4493df8b-b09f-47b1-b230-b83cfe1afa76","detailedMessage":"Graph does not support graph computer"}

So is it possible to use GraphComputer traversal in Amazon Neptune?


Solution

  • Amazon Neptune does not currently support the Apache TinkerPop GraphComputer interface. You have a few options.

    1. In some cases it is possible to use the example queries in the Gremlin Recipes document to calculate connected components etc.
    2. Export the data using the Neptune Export tool and run the analysis you need to do using Spark (Glue and EMR are good options). This is quite commonly done today.
    3. For modest size datasets you can import the data into NetworkX and run the analysis all from a Jupyter Notebook.

    UPDATE 2024-01-29

    In December 2023, Neptune Analytics was released. It includes support for built in algorithms, including PageRank and Shortest Path computation. The documentation for the algorithms is here