Search code examples
gremlintinkerpopjanusgraphtinkerpop3amazon-neptune

Anonymous traversal vs normal traversal gremlin


I have read the documentation about anonymous traversals. I understand they can be started with __ and they can be used inside step modulators. Although I dont understand it conceptually. Why cannot we use a normal traversal spawned from graph traversal source inside step modulators? For example, in the following gremlin code to create an edge

        this.g
            .V(fromId) // get vertex of id given for the source
            .as("fromVertex") // label as fromVertex to be accessed later
            .V(toId) // get  vertex of id given for destination
            .coalesce( // evaluates the provided traversals in order and returns the first traversal that emits at least one element
                inE(label) // check incoming edge of label given
                    .where( // conditional check to check if edge exists
                        outV() // get destination vertex of the edge to check
                            .as("fromVertex")), // against staged vertex
                addE(label) // add edge if not present
                    .property(T.id, id) // with given id
                    .from("fromVertex")) // from source vertexx
            .next(); // end traversal to commit to graph

why are __.inE() and __.addE() anonymous? Why cannot we write this.g.inE() and this.g.addE() instead? Either ways, the compiler is not complaining. So what special benefit does anonymous traversal gives us here?


Solution

  • tldr; Note that in 3.5.0, users are prevented from utilizing a traversal spawned from a GraphTraversalSource and must use __ so it is already something you can expect to see enforced in the latest release.

    More historically speaking....

    A GraphTraversalSource, your g, is meant to spawn new traversals from start steps with the configurations of the source assigned. An anonymous traversal is meant to take on the internal configurations of the parent traversal it is assigned to as it is spawned "blank". While a traversal spawned from g can have its internal configuration overwritten, when assigned to a parent, it's not something that is really part of the design for it to always work that way, so you take a chance in relying on that behavior.

    Another point is that from the full list of Gremlin steps, only a few are actually "start steps" (i.e. addV(), addE(), inject(), V(), E()) so in building your child traversals you can really only ever use those options. As you often need access to the full list of Gremlin steps to start a child traversal argument, it is better to simply prefer __. By being consistent with this convention, it prevents confusion as to why child traversals "sometimes start with g and other times start with __" if they are used interchangeably throughout a single traversal.

    There are perhaps other technical reasons why the __ is required. An easy one to see that doesn't require a ton of explanation can be demonstrated in the following Gremlin Console snippet:

    gremlin> __.addV('person').steps[0].class
    ==>class org.apache.tinkerpop.gremlin.process.traversal.step.map.AddVertexStep
    gremlin> g.addV('person').steps[0].class
    ==>class org.apache.tinkerpop.gremlin.process.traversal.step.map.AddVertexStartStep
    

    The two traversals do not produce analogous steps. If using g in replace of __ works today, it is by coincidence and not by design, which means that it could have the potential to break in the future.