Search code examples
amazon-web-servicesgraphgremlinamazon-neptune

How does it come that my gremlin query delivers two different results?


I needed to write a query which merges two vertices together with all the outgoing edges and properties. For the development process I used gremlify and following testdata:

g.addV("TopVertex").property('id', 4713).property('testProperty1','testProperty1').as('vertex1')
.addV("SubVertex").property('name','C1').as('C1')
.addV("SubVertex").property('name','C2').as('C2')
.addV("SubVertex").property('name','C3').as('C3')
.addE("splitsInto").from('vertex1').to('C1').property('ordinal',1)
.addE("splitsInto").from('vertex1').to('C2').property('ordinal',2)
.addE("splitsInto").from('vertex1').to('C3').property('ordinal',3)

.addV("TopVertex").property('id', 4713).property('testProperty2','testProperty2').as('vertex2')
.addV("SubVertex").property('name','C4').as('C4')
.addV("SubVertex").property('name','C5').as('C5')
.addV("SubVertex").property('name','C6').as('C6')
.addE("splitsInto").from('vertex2').to('C4').property('ordinal',4)
.addE("splitsInto").from('vertex2').to('C5').property('ordinal',5)
.addE("splitsInto").from('vertex2').to('C6').property('ordinal',6)

I came up with this solution:

g.addV("MergedVertex").property('id', 4713).as('mergedVertex').
V().hasLabel("TopVertex").has('id', 4713).as('oldVertices').
       outE().as('oldEdges').
       inV().as('inVertices').
       select('mergedVertex').
       addE('splitsInto').to('inVertices').as('newEdges').
       sideEffect(select('oldEdges').properties().
                  unfold().as('props').
                  select('newEdges').
                  property(select('props').key(), select('props').value())).
       select('oldVertices').drop()

When I executed this query on gremlify everything did run perfectly. But when I executed this query on my neptune database (running engine version 1.1.1.0) only the first edge property got copied. With this I mean the property "ordinal 1" and "ordinal 4" is present, the rest disappeared.

I would have expected that the result is the same. Can someone explain to me why the results differ?


Solution

  • After some fairly in-depth analysis of what is going on here, the difference in results is due to the way that each system is processing the drop step. The root cause is that in both cases the drop step is evaluated in a lazy rather than greedy fashion. You can see the same results on TinkerGraph if you disable LazyBarrierStrategy. For example:

    gremlin> g.withoutStrategies(LazyBarrierStrategy.class).addV("MergedVertex").property('id', 4713).as('mergedVertex').
    ......1> V().hasLabel("TopVertex").has('id', 4713).as('oldVertices').
    ......2>        outE().as('oldEdges').
    ......3>        inV().as('inVertices').
    ......4>        select('mergedVertex').
    ......5>        addE('splitsInto').to('inVertices').as('newEdges').
    ......6>        sideEffect(select('oldEdges').properties().
    ......7>                   unfold().as('props').
    ......8>                   select('newEdges').
    ......9>                   property(select('props').key(), select('props').value())).
    .....10>        select('oldVertices').drop()
    
    gremlin> g.E()
    ==>e[102][100-splitsInto->91]
    ==>e[103][100-splitsInto->79]
    
    gremlin> g.E().valueMap()
    ==>[ordinal:4]
    ==>[ordinal:1]
    

    Even more subtly, TinkerGraph is working in some ways more by chance. If there were more than 2500 results flowing into the drop you would see similar behavior on TinkerGraph as you see on Neptune even without disabling LazyBarrierStrategy. This is because the default maximum barrier size in TinkerGraph is 2500.

    For now, on Neptune you could achieve the results you want by adding a barrier step to the query. For example:

    g.addV("MergedVertex").property('id', 4713).as('mergedVertex').
    V().hasLabel("TopVertex").has('id', 4713).as('oldVertices').
           outE().as('oldEdges').
           inV().as('inVertices').
           select('mergedVertex').
           addE('splitsInto').to('inVertices').as('newEdges').
           sideEffect(select('oldEdges').properties().
                      unfold().as('props').
                      select('newEdges').
                      property(select('props').key(), select('props').value())).
           select('oldVertices').barrier(100000).drop()
    

    This is something that needs some additional discussion within the TinkerPop community as to how best to adjust this behavior. For now, the barrier step should provide a solution.