Search code examples
gremlinamazon-neptunetinkerpop3

Getting children and grandchildren in a single query in Gremlin


I am currently rewriting some of the queries written in Cypher to Gremlin. I want to create a single query that would return for the specific starting node:

  1. up to the children - based on the edge property called 'prob.' We want to get up to 10 children with biggest probability (sorted by edge property 'prob' in desc order).
  2. For every child we would like to continue and get up to 10 grandchildren with the highest probability (similar to point 1).

I attached an image showing the result we would like to get - but with the assumption of getting up to 2 nodes instead of 10 for simplicity. As the result, we would also like to get all the properties of children and grandchildren.

Thank you! Example of graph traversal

Edit: I came up with the following solution. Maybe someone point out a better approach but is seems the query returns the correct result.

g.V('123')
  .inE().order().by('prob', Order.desc).limit(10).outV().as('c')
  .project('c', 'gc').by(valueMap(true)).by(inE().order().by('prob', Order.desc).limit(10).outV().valueMap(true).fold())

Solution

  • You should be able to use the local step in a case like this where, for each child you want a limited number of grandchildren. As I do not have your data set, here are some examples that I think map well to your use case. Before looking at the edges, here is a basic example that shows how local can help (I used a limit of 2 just to keep things simple).

    g.V('44').
      out().limit(2).
      local(out().limit(2)).
      path().
        by('code')
    

    This yields

    1   path[SAF, DFW, ASE]
    2   path[SAF, DFW, GEG]
    3   path[SAF, LAX, YLW]
    4   path[SAF, LAX, ASE]
    

    In the air routes data set, each edge has a "dist" property (the route distance), we can use that to simulate your use case.

    g.V('44').
      outE('route').order().by('dist',desc).inV().limit(2).
      local(outE('route').order().by('dist',desc).inV().limit(2)).
      path().
        by('code').
        by('dist')
    

    Which we can see picks the longest routes

    1   path[SAF, 708, LAX, 8756, SIN]
    2   path[SAF, 708, LAX, 8372, AUH]
    3   path[SAF, 549, DFW, 8574, SYD]
    4   path[SAF, 549, DFW, 8105, HKG]