Why is Gremlin query using Until/Repeat so much less performant than direct edge traversal?

I'm trying to understand a query plan in a more complex query but for simplicity I broke it down to a simpler example. I'm not understanding why a direct edge traversal is so much faster than an until/repeat traversal.

You can set up the scenario with the following Gremlin query.

%%gremlin 
g.addV('root').as('root')
.addV('person').as('person')
.addE('contains').from('root').to('person')

Notice its just a "Root" node that has a contains edge to a "Person" node.

If I run this query starting with the person vertex, the query plan is showing a 0.478ms execution time, lightning fast as expected.

%%gremlin profile
g.V('f4c17843-394d-a720-5525-bb7bedced833').as('person')
.inE('contains').outV().hasLabel('root').as('root')

Query mode                                      | profile
Query execution time (ms)                       | 0.456
Request execution time (ms)                     | 11.103

However, if I run a slightly more convoluted query using Until/Repeat, the execution time takes 18ms, almost 40x slower.

%%gremlin profile
g.V('f4c17843-394d-a720-5525-bb7bedced833').as('person')
.until(hasLabel('root')).repeat(inE('contains').outV()).as('root')

Query mode                                      | profile
Query execution time (ms)                       | 18.977
Request execution time (ms)                     | 33.466

I'm surprised how much slower this query is because despite doing an until/repeat step, it still only needs to traverse the 1 edge from the Person back to the Root.

Am I wrong to think these queries should run in a similar amount of time? Is there really that much overhead with Until/Repeat?

Solution

In general, the repeat loop has a little more setup overhead and measuring it for a "single hop" traversal is probably the worst case scenario. It's also likely that the query will be slightly faster if the until appears after the repeat. In general, repeat looping will perform well for multi hop traversals. Also worthy of note, the repeat step, in the absence of a limit or other constraint, will try to explore the graph to any depth and there is some overhead in setting that up.

You can observe this difference even using a basic TinkerGraph.

gremlin> g.V().has('code','YPO').outE().inV().has('code','YAT').profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[code.eq(YPO)])                                 1           1           5.247    96.30
VertexStep(OUT,vertex)                                                 1           1           0.142     2.62
HasStep([code.eq(YAT)])                                                1           1           0.058     1.08
                                            >TOTAL                     -           -           5.449        -

gremlin> g.V().has('code','YPO').until(has('code','YAT')).repeat(outE().inV()).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[code.eq(YPO)])                                 1           1          50.750    96.78
RepeatStep(until([HasStep([code.eq(YAT)])]),[Ve...                     1           1           1.688     3.22
  HasStep([code.eq(YAT)])                                                                      0.033
  VertexStep(OUT,vertex)                                               1           1           0.623
  RepeatEndStep                                                                                0.077
                                            >TOTAL                     -           -          52.438        -

In general, I would not worry too much about what you observed here, as the repeat step comes into its own when you need to traverse paths of multiple hops and is not really intended for these "one hop" patterns where there is only one possible solution (in a two node graph).