Search code examples
gremlintinkerpopamazon-neptune

Modify the gremlin Query to add a new field along with existing one


I am new to Gremlin struggling modify the query to get the appropriate result. I have a query that gives the depth of the graph. The query is as below:

g.withSack(0)
  .V('company_1')
  .repeat(
    outE('HAS_SHRHLDING_PC_TO')
    .sack(sum).by(constant(1))
    .inV()
    .simplePath())
    .until(not(outE()))
  .sack()
  .max()
  .aggregate('x')
  .fold()
  .V(company_1)
  .repeat(
    outE('HAS_VOTING_PC_TO')
    .sack(sum).by(constant(1))
    .inV()
    .simplePath())
    .until(not(outE()))
  .sack()
  .max()
  .aggregate('x')
  .cap('x')
  .unfold()
  .max()

The output is return in number. The output is as below:

4

I am trying to modify this query to get the output as key-value pair, rather than only value as I have to add more one key and value to the output.

I want to retrieve the company_number that is passed in input along with the depth.

sample expected output:

{'company_number': 'company_1', 'depth': 4}

I tried to use project was unsuccessful in achieving the desired result. Any help would be appreciated. Thanks


Solution

  • Here's one way you might do it:

    gremlin> g = TinkerFactory.createModern().traversal()
    ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
    gremlin> g.withSack(0).V(1).as('start').
    ......1>   repeat(out().simplePath().sack(sum).by(constant(1))).
    ......2>     emit(__.not(outE())). 
    ......3>   dedup(). 
    ......4>   select('start').
    ......5>   project('v','depth').
    ......6>     by().
    ......7>     by(sack()).
    ......8>   order().by('depth',desc).
    ......9>   limit(1)
    ==>[v:v[1],depth:2]
    

    I avoid using max() so that the path history isn't lost and we can select('start') to get back to the start vertex. At that point, it's easy to project() and then order by the "depth" to get the deepest one.

    To take this a step further, if you supplied multiple start vertices you would find that the query is grabbing the deepest point only. In other words, the order().limit() is doing that globally for all start vertices which is great if you just want the deepest path. If you want the deepest path per start vertex you need to use map():

    gremlin> g.withSack(0).V(1,4).as('start').
    ......1>   map(
    ......2>   repeat(out().simplePath().sack(sum).by(constant(1))).
    ......3>     emit(__.not(outE())).
    ......4>   dedup().
    ......5>   select('start').
    ......6>   project('v','depth').
    ......7>     by().
    ......8>     by(sack()).
    ......9>   order().by('depth',desc).
    .....10>   limit(1))  
    ==>[v:v[1],depth:2]
    ==>[v:v[4],depth:1]