Search code examples
graph-databasesgremlinamazon-neptune

How to traverse back to root vertex in gremlin in one single query


I am using Gremlin in amazon-neptune. I have vertex user, country, order
I have edge 'lives_in' from user to country ,edge 'purchased' from user to order, edge 'delivered' from order to country

Goal : Find top most country which purchases most orders to foreign country rather than live_in country in descending order

gremlin> g.V().hasLabel("user").outE('purchased').inV().hasLabel("order"). 
......1> outE("delivered").inV().hasLabel("country").
......2> has('name').neq(outE('lives_in').inV().hasLabel("country").values()).
......3> groupCount().by(values)

I am not able to traverse back to root vertex from step neq(outE("lives_in"))
I am getting the same results after removing the last has step

gremlin> g.V().hasLabel("user").outE('purchased').inV().hasLabel("order").
......1> outE("delivered").inV().hasLabel("country")

This means my last has step is not executing.
Result sample - {v[country_GB]=38,v[country_NZ]=6,v[country_AU]=3}


Solution

  • It's always helpful to include a small sample graph like this in your question:

    g.addV('user').as('u1').
      addV('user').as('u2').
      addV('order').as('o1').
      addV('order').as('o2').
      addV('order').as('o3').
      addV('order').as('o4').
      addV('order').as('o5').
      addV('order').as('o6').
      addV('country').property('name','usa').as('usa').
      addV('country').property('name','candada').as('can').
      addV('country').property('name','mexico').as('mex').
      addE('lives_in').from('u1').to('usa').
      addE('lives_in').from('u2').to('mex').
      addE('purchased').from('u1').to('o1'). 
      addE('purchased').from('u1').to('o2').
      addE('purchased').from('u1').to('o3').
      addE('purchased').from('u1').to('o4').
      addE('purchased').from('u2').to('o5').
      addE('purchased').from('u2').to('o6').
      addE('delivered').from('o1').to('usa').
      addE('delivered').from('o2').to('mex').
      addE('delivered').from('o3').to('mex').
      addE('delivered').from('o4').to('can').
      addE('delivered').from('o5').to('mex').
      addE('delivered').from('o6').to('can').iterate()
    

    Based on that, here's one way you might do this:

    gremlin> g.V().hasLabel("user").as('u').
    ......1>   out('lives_in').hasLabel("country").as('c'). 
    ......2>   select('u').
    ......3>   out('purchased').hasLabel("order").
    ......4>   out("delivered").hasLabel("country").
    ......5>   where(neq('c')).
    ......6>   groupCount().
    ......7>     by('name')
    ==>[mexico:2,candada:2]
    

    A few things to note:

    1. Simplify inE().outV() and outE().inV() to just in() and out() respectively if you aren't doing anything with the edge.
    2. At the line marked 1, the "lives_in" country vertex is labelled for later comparison against the "delivered" countries at line 5
    3. The result excludes order labelled "o1" and the order labelled "o5" as both of those orders originated in the country to which they were shipped.