Search code examples
graphgremlinamazon-neptune

Why is my Gremlin query resulting in so many requests? Is this correct behavior?


I'm attempting to debug a performance issue I'm having with AWS Neptune. I am running some Gremlin queries and they seem to always result in 30 requests on the database. I'm wondering if I've done something wrong in my query.

The strange thing about this issue is that is occurring all of a sudden. Previously, this was working totally fine and we weren't having performance issues.

Each call I do has two general queries, one for nodes and one for edges:

nodes = g.V(id).emit().repeat(__.out('manages')).dedup().project('label', 'name', 'job', 'department', 'manager').\
    by(__.id()).by('name').by('job').by('department').by('manager').toList()

id_list = list(map(lambda node: node["label"], nodes))

edges = g.V(id).emit().repeat(__.out('manages')).dedup().bothE('similar_to').dedup().\
    where(__.and_(__.inV().has(T.id, P.within(id_list)), __.outV().has(T.id, P.within(id_list)))).\
    project('from', 'to', 'similarity').by(__.outV().id()).by(__.inV().id()).by('similarity').toList()

Essentially, I have two edge types: manages and similar_to. I try to create a tree using the 'manages' edges, and then find all 'similar_to' edges within that tree.

This query gives the desired result, but is it unoptimized?


Solution

  • Both traversals follow pretty much the same path, that makes it easy to combine them:

    g.V(id).
      emit().
        repeat(__.out('manages')).
      aggregate('x').
      bothE('similar_to').dedup().
      filter(__.otherV().where(P.within('x'))).
      project('from', 'to', 'similarity').
        by(__.outV().id()).
        by(__.inV().id()).
        by('similarity').
      toList()
    

    And now I just realized that we can make it even simpler. Since you require both vertices connected by the similar_to to be part of x, it means that every edge in the result must be an out-edge for any of the vertices in x. So instead of using bothE and otherV (which enables path tracking), we can just use outE and inV:

    g.V(id).
      emit().
        repeat(__.out('manages')).
      aggregate('x').
      outE('similar_to').dedup().
      filter(__.inV().where(P.within('x'))). /* outV is already guaranteed to be within "x" */
      project('from', 'to', 'similarity').
        by(__.outV().id()).
        by(__.inV().id()).
        by('similarity').
      toList()