Search code examples
gremlintinkerpopamazon-neptune

Using Gremlin (AWS Neptune), how can I get all paths of length n from a starting node traversing edges with specific criteria?


Starting with node 1, I want to return all the paths (edges and vertices, with id/label/properties) within n hops following any outbound edges, or following inbound edges with a property predicate (p > 50).

Ideally the paths shouldn't contain any cycles, so a path shouldn't contain the same node twice.

The property p is not present on every edge.

g.addV().property(id, 1).as('1').
  addV().property(id, 2).as('2').
  addV().property(id, 3).as('3').
  addV().property(id, 4).as('4').
  addV().property(id, 5).as('5').
  addV().property(id, 6).as('6').
  addV().property(id, 7).as('7').
  addV().property(id, 8).as('8').
  addE('pointsAt').from('1').to('2').
  addE('pointsAt').from('3').to('1').
  addE('pointsAt').from('4').to('1').property('p', 10).
  addE('pointsAt').from('5').to('1').property('p', 100).
  addE('pointsAt').from('2').to('6').
  addE('pointsAt').from('7').to('2').
  addE('pointsAt').from('8').to('2').property('p', 100).
  iterate()

Assuming we start at Vertex 1 the paths would look like:

1>2
1>2>6
1>2>8
1>5
  • 1-2 is included because it's outbound
  • 1-3 is excluded because it is inbound to 1 and doesn't have p
  • 1-4 is excluded because it is inbound and (p > 50) is false
  • 1-5 is included because it is inbound and (p > 50) is true
  • 2-6 is included because it's outbound
  • 2-7 is excluded because it is inbound to 2 and doesn't have p
  • 2-8 is included because it is inbound to 2 and p > 50

I've experimented with many different approaches and I can't seem to get anything close to what I am looking for.


Solution

  • I believe this is what you're looking for:

    g.V(1).
       repeat(
          union(
             outE(),inE().has('p',gt(50))
          ).
          otherV().simplePath()).
       emit().
       times(2).
       path().
          by(valueMap().with(WithOptions.tokens))
    

    The repeat() and times() steps dictate that this is a recursive traversal going to a depth of 2. The union() step and containing arguments follow your requirements to include all outgoing edges and only incoming edges with a p property greater than 50. The emit() step forces the repeat() step to stream all paths as they are found. If you didn't include this, you would only get the paths found of length 2 (declared in the times() step).

    To wrap this up, we use path() and by() to output the paths found and all of the ids, labels, and properties for each vertex and edge in the path.

    The output of this from the graph that you provided looks like this:

    ==>[[id:1,label:vertex],[id:0,label:pointsAt],[id:2,label:vertex]]
    ==>[[id:1,label:vertex],[id:3,label:pointsAt,p:100],[id:5,label:vertex]]
    ==>[[id:1,label:vertex],[id:0,label:pointsAt],[id:2,label:vertex],[id:4,label:pointsAt],[id:6,label:vertex]]
    ==>[[id:1,label:vertex],[id:0,label:pointsAt],[id:2,label:vertex],[id:6,label:pointsAt,p:100],[id:8,label:vertex]]