Search code examples
gremlintinkerpopjanusgraph

gremlin intersection with `select` and `as`


I'm following up with these 2 questions --

gremlin intersection operation

JanusGraph Gremlin graph traversal with `as` and `select` provides unexpected result

I'm viewing StackOverflow intensively(wanted to thank the community!) but unfortunately I didn't post/write a lot, so I don't even have enough reputation for posting a comment on the posts above...therefore I'm asking my questions here..

In 2nd post above, Hieu and I work together, and I want to provide a bit more background on the question.

As Stephen asked in the comment(for 2nd post), the reason that I want to chain V() in the middle is simply because I want to start the traversal from the beginning, i.e. each and every node of the whole graph just like what g.V() does, which appears at the beginning of most of the queries in gremlin documentation.

A bit more illustration: suppose I need 2 conditional filters on the results. Basically I want to write

g.V().(Condition-A).as('setA')
 .V().(Condition-B).as('setB')
 select('setA').
 where('setA',eq('setB'))

which borrows the last answer from Stephen's answer in the 1st post. Here Condition-A and Condition-B is just a chaining of different filter steps like has or hasLabel etc.

What should I write at the place of .V() in the middle? Or is there some other way to write the query so that Condition-B is completely independent of Condition-A?

Finally, I've read the section for chaining V() in the middle of a query at https://tinkerpop.apache.org/docs/3.5.0/reference/#graph-step. I still cannot fully understand the weird consequences for 2nd post, maybe I should read more about how traversers work?

Thanks Kelvin and Stephen again. Glad and excited to connect with you who wrote a book/wrote the source code for gremlin.


Solution

  • In the middle of a traversal, a V() is applied to every traverser that has been created by the prior steps. Consider this example using the air-routes data set:

    g.V(1,2,3)
    

    This will yield three results:

    v[1]
    v[2]
    v[3]
    

    and if we count all vertices in the graph:

    gremlin> g.V().count()
    ==>3747 
    

    we get 3,747 results. If we now do:

    gremlin> g.V(1,2,3).V().count()
    ==>11241
    

    we get 11,241 results (exactly 3 times 3747). This is because for each result from g.V(1,2,3) we counted every vertex in the graph.

    EDITED to add:

    If you need to aggregate some results and then explore the graph again using those results as a filter, one way is to introduce a fold step. This will collapse all of the traversers back into one again. This ensures that the second V step will not be repeated multiple times by any prior fan out.

    gremlin> g.V(1,2,3).fold().as('a').V().where(within('a'))
    ==>v[1]
    ==>v[2]
    ==>v[3]
    
    gremlin> g.V(1,2,3).fold().as('a').V().where(without('a')).limit(5)
    ==>v[0]
    ==>v[4]
    ==>v[5]
    ==>v[6]
    ==>v[7]    
    

    EDITED again to add:

    The key part I think people sometimes struggle with is how Gremlin traversals flow. You can think of a query as containing/spawning one or more parallel streams (it may not be executed that way but conceptually it helps me to think of it that way). So g.V('1') creates one stream (we often refer to them as traversers). However g.V('1').out() might create multiple traversers if there is more than one outgoing edge originating from V('1'). When a fold is encountered the traversers are all collapsed back down to one again.