Search code examples
gremlintinkerpopamazon-neptune

Gremlin query with variable edge depth without using Union


I'm trying to put together a Gremlin query that returns results for 1 to n depth of a certain edge type - without having to resort to using multiple queries stitched together with .union().

I have some test data that simulates the structure of sales offices and people that work in them, including who manages which offices and which offices "roll up" under the jurisdiction of which higher level offices. The following screen shot (from Neo4j, actually) shows a subset of the graph that I'm going to reference.

Graph of sample data

The graph can be created with the following:

g.
addV('Office').as('O_111').property('code','111').
addV('Office').as('O_356').property('code','356').
addV('Office').as('O_279').property('code','279').
addV('Office').as('O_KC5').property('code','KC5').
addE('MERGES_INTO').from('O_356').to('O_111').
addE('MERGES_INTO').from('O_279').to('O_356').
addE('MERGES_INTO').from('O_KC5').to('O_279').
addV('Person').as('Bob').property('name','Bob').
  addE('MANAGES').from('Bob').to('O_111').addE('WORKS_WITH').from('Bob').to('O_111').
addV('Person').as('Michael').property('name','Michael').addE('WORKS_WITH').from('Michael').to('O_111').
addV('Person').as('John').property('name','John').addE('WORKS_WITH').from('John').to('O_111').
addV('Person').as('Rich').property('name','Rich').addE('WORKS_WITH').from('Rich').to('O_111').
addV('Person').as('Matt').property('name','Matt').
  addE('WORKS_WITH').from('Matt').to('O_279').addE('MANAGES').from('Matt').to('O_279').
addV('Person').as('Judy').property('name','Judy').addE('WORKS_WITH').from('Judy').to('O_279').
addV('Person').as('Joe').property('name','Joe'). addE('WORKS_WITH').from('Joe').to('O_279').
addV('Person').as('Ben').property('name','Ben').addE('WORKS_WITH').from('Ben').to('O_279').
addV('Person').as('Ron').property('name','Ron').addE('WORKS_WITH').from('Ron').to('O_KC5').

If I want to see which people (orange) that work with an office (pink) that Bob directly or indirectly manages (because, for example, offices KC5, 279, and 356 roll up to Bob's 111 office), I can use .union() and something like the following to get the proper results:

gremlin> g.V().has('Person','name','Bob').
......1>   out('MANAGES').
......2>   union(
......3>     __.in('WORKS_WITH'),
......4>     __.in('MERGES_INTO').in('WORKS_WITH'),
......5>     __.in('MERGES_INTO').in('MERGES_INTO').in('WORKS_WITH'),
......6>     __.in('MERGES_INTO').in('MERGES_INTO').in('MERGES_INTO').in('WORKS_WITH')
......7>     ).
......8>   values('name').fold()
==>[Bob, Michael, John, Rich, Matt, Judy, Joe, Ben, Ron]

That seems super verbose and awkward. Is that my only choice? Is there a better way that doesn't seem so redundant like .union()?

Coming from a Neo4j world, I'd just do something with a ranged depth of "0 or more" using *0.., like this:

MATCH (manager:Person {name:'Bob'}) 
OPTIONAL MATCH (manager)-[:MANAGES]->(:Office)<-[:MERGES_INTO*0..]-(:Office)<-[:WORKS_WITH]-(p:Person)
RETURN p

How do I achieve the same sort of thing in Gremlin? Even if I can't do open ended, but could do 1 to some arbitrary limit (say, 1 to 10), that would work. It probably wouldn't matter, but I will be using AWS Neptune for the actual Graph database.


Solution

  • When asking questions about Gremlin, a picture of your graph is nice, but a script that provides some sample data is even better - like this:

    g.addV('person').property('name','michael').as('mi').
      addV('person').property('name','john').as('jo').
      addV('person').property('name','rich').as('ri').
      addV('person').property('name','bob').as('bo').
      addV('person').property('name','matt').as('ma').
      addV('person').property('name','ron').as('ro').
      addV('person').property('name','joe').as('joe').
      addV('person').property('name','ben').as('be').
      addV('person').property('name','judy').as('ju').
      addV('office').property('name','111').as('111').
      addV('office').property('name','356').as('356').
      addV('office').property('name','279').as('279').
      addV('office').property('name','kc5').as('kc5').
      addE('mergesInto').from('kc5').to('279').
      addE('mergesInto').from('279').to('356').
      addE('mergesInto').from('356').to('111').
      addE('worksWith').from('mi').to('111').
      addE('worksWith').from('jo').to('111').
      addE('worksWith').from('ri').to('111').
      addE('worksWith').from('bo').to('111').
      addE('manages').from('bo').to('111').
      addE('worksWith').from('ma').to('279').
      addE('manages').from('ma').to('279').
      addE('worksWith').from('joe').to('279').
      addE('worksWith').from('be').to('279').
      addE('worksWith').from('ju').to('279').
      addE('worksWith').from('ro').to('kc5').iterate()
    

    Your instincts are correct where union() isn't quite right for what you want to do. I would prefer repeat():

    gremlin> g.V().has('person','name','bob').
    ......1>   out('manages').
    ......2>   repeat(__.in('worksWith','mergesInto')).
    ......3>     emit(hasLabel('person')).
    ......4>   values('name')
    ==>bob
    ==>michael
    ==>john
    ==>rich
    ==>matt
    ==>joe
    ==>ben
    ==>judy
    ==>ron
    

    In this way it traverses to arbitrary depth (though we tend to recommend setting some kind of sensible limit to avoid problems if you run into some unexpected cycle somewhere) and is much more succinct. Note the use of emit() which controls which types of vertices are returned from the repeat() - if you do not include that filter you will also return "office" vertices.