I'm trying to understand a query plan in a more complex query but for simplicity I broke it down to a simpler example. I'm not understanding why a direct edge traversal is so much faster than an until/repeat traversal.
You can set up the scenario with the following Gremlin query.
%%gremlin
g.addV('root').as('root')
.addV('person').as('person')
.addE('contains').from('root').to('person')
Notice its just a "Root" node that has a contains edge to a "Person" node.
If I run this query starting with the person vertex, the query plan is showing a 0.478ms execution time, lightning fast as expected.
%%gremlin profile
g.V('f4c17843-394d-a720-5525-bb7bedced833').as('person')
.inE('contains').outV().hasLabel('root').as('root')
Query mode | profile
Query execution time (ms) | 0.456
Request execution time (ms) | 11.103
However, if I run a slightly more convoluted query using Until/Repeat, the execution time takes 18ms, almost 40x slower.
%%gremlin profile
g.V('f4c17843-394d-a720-5525-bb7bedced833').as('person')
.until(hasLabel('root')).repeat(inE('contains').outV()).as('root')
Query mode | profile
Query execution time (ms) | 18.977
Request execution time (ms) | 33.466
I'm surprised how much slower this query is because despite doing an until/repeat step, it still only needs to traverse the 1 edge from the Person back to the Root.
Am I wrong to think these queries should run in a similar amount of time? Is there really that much overhead with Until/Repeat?
In general, the repeat
loop has a little more setup overhead and measuring it for a "single hop" traversal is probably the worst case scenario. It's also likely that the query will be slightly faster if the until
appears after the repeat
. In general, repeat
looping will perform well for multi hop traversals. Also worthy of note, the repeat
step, in the absence of a limit
or other constraint, will try to explore the graph to any depth and there is some overhead in setting that up.
You can observe this difference even using a basic TinkerGraph.
gremlin> g.V().has('code','YPO').outE().inV().has('code','YAT').profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[code.eq(YPO)]) 1 1 5.247 96.30
VertexStep(OUT,vertex) 1 1 0.142 2.62
HasStep([code.eq(YAT)]) 1 1 0.058 1.08
>TOTAL - - 5.449 -
gremlin> g.V().has('code','YPO').until(has('code','YAT')).repeat(outE().inV()).profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[code.eq(YPO)]) 1 1 50.750 96.78
RepeatStep(until([HasStep([code.eq(YAT)])]),[Ve... 1 1 1.688 3.22
HasStep([code.eq(YAT)]) 0.033
VertexStep(OUT,vertex) 1 1 0.623
RepeatEndStep 0.077
>TOTAL - - 52.438 -
In general, I would not worry too much about what you observed here, as the repeat
step comes into its own when you need to traverse paths of multiple hops and is not really intended for these "one hop" patterns where there is only one possible solution (in a two node graph).