Search code examples
gremlin

Gremlin repeat operations from same start point


Objective

I want to generate random walks in Gremlin, and already have the command to generate one: g.V(<start_id>).repeat(local(both().sample(1))).times(<depth>).path().
While this is good, I do have to generate <nb_rw_per_node> random walks per start node, and I'd like to use a unique query to handle it if possible.

Issue

I've tried using the repeat() step, in combination with select() to do this, as follows:

g.V(<start_id>).as("start").
  repeat(
    select("start").
    repeat(
      local(
        both().sample(1)
      )
    ).times(<depth>).path()
  ).emit().times(<nb_rw_per_node>)

This yields the following results, which I don't understand (here, <depth> = 2 and <nb_rw_per_nodes> = 2)

gremlin> g.V(6652128).as("start").repeat(select("start").repeat(local(both().sample(1))).times(2).path()).emit().times(2)
==>path[v[6652128], v[6652128], v[95670392], v[1044704]]
==>path[v[6652128], v[6652128], v[95670392], v[1044704], path[v[6652128], v[6652128], v[95670392], v[1044704]], v[6652128], v[94818432], v[245928]]

How can I not get the first node doubled in the path?
Why is the second result the concatenation of the first result and the concatenation of the first result and a random walk of the correct length? I expected to get another path of the same format as the first one.

Is this the correct way to generate multiple paths from a same initial node in a single query? If so, how can I correct my query?

Thanks to everyone reading and answering!


Solution

  • When you select you essentially add another copy of the thing selected to the path. If you need 2 random walks from the same start, why not just include the start twice at the very beginning? So the query becomes something like this (using a data set I have to hand):

    gremlin> g.V(44,44).repeat(local(out().sample(1))).times(2).path()
    
    ==>[v[44],v[8],v[580]]
    ==>[v[44],v[20],v[34]]   
    

    To use nested repeat steps you will need something like this:

    gremlin> g.V('44').as('s').
    ......1>   repeat(select('s').as('start').
    ......2>          repeat(local(out().sample(1))).
    ......3>          times(4).path().from('start')).
    ......4>   times(3).
    ......5>   emit()
    
    ==>[v[44],v[31],v[271],v[149],v[4]]
    ==>[v[44],v[31],v[264],v[1],v[152]]
    ==>[v[44],v[8],v[38],v[4],v[190]] 
    

    This last option is a little gimmicky, but also works.

    gremlin> g.V(44).
    ......1>   repeat(store('x').identity()).times(3).
    ......2>   cap('x').
    ......3>   unfold().as('start').
    ......4>   repeat(local(out().sample(1))).
    ......5>   times(2).
    ......6>   path().
    ......7>     from('start')
    
    ==>[v[44],v[31],v[42]]
    ==>[v[44],v[8],v[407]]
    ==>[v[44],v[13],v[53]]
    

    In each of the last two examples, the real key is the introduction of the from step to avoid the redundant starting vertex entries from being included. Try running the queries without the from to see the difference.