Search code examples
gremlintinkerpoptinkerpop3gremlin-server

Gremlin query works or not depending on the context


In a query (by the way written by stephen mallette in this question) the problem is that works in gremlify, but when I paste it in my project gives an incorrect output.

So I opened gremlify to write a data creation query to then paste it in the gremlin console so I can test it there, and I noticed it doesn't work in gremlify if it's executed after the data creation part of the query and it should work as far as I understand.

The query is this:

g.V().as('a').
  repeat(both().simplePath()).
    times(2).
  where(both().as('a')).
  path().
  map(unfold().limit(3).order().by(id).dedup().fold())
  dedup().
  group('m').
    by(limit(local,2)).
  group('m').
    by(tail(local,2)).
  group('m').
    by(union(limit(local,1),tail(local,1)).fold()).     
  cap('m').
  unfold().
  map(select(values).unfold().unfold().order().by(id).dedup().fold()).
  dedup().
  map(unfold().values('name').fold())

Here it works, output is correct: https://gremlify.com/psiygozr559

Here it gives an incorrect output: https://gremlify.com/mqw6ut0y1z (same graph but created with a query)

Here it does not give any output at all: https://gremlify.com/fzgmzdq1omq (same than before with a change in line 1)

In my project also gives an incorrect output and I'm not executing anything weird before the query like in the gremlify projects above.

There is another query that does the same, I wrote it myself, is less efficient but works perfectly in all the same situations and in my project, see:

https://gremlify.com/zihygx0w8e

https://gremlify.com/xsc6q8dranj

In my project I'm connecting to gremlin server locally with with the default configuration untouched, using Node.js.

Something is happening here that I don't understand.


Solution

  • By extending the traversal you've extended the Path. Placing addV('j') at the front of the traversal adds something my original algorithm did not take into account:

    gremlin> g.addV("j").sideEffect(V().drop()).sideEffect(
    ......1>     addV("user").property("name", "luana").as("luana")
    ......2>     .addV("user").property("name", "luisa").as("luisa")
    ......3>     .addV("user").property("name", "sabrina").as("sabrina")
    ......4>     .addV("user").property("name", "marcello").as("marcello")
    ......5>     .addV("user").property("name", "mario").as("mario")
    ......6>     .addV("user").property("name", "lidia").as("lidia")
    ......7>     
    ......7>     .addE("friend").from("luana").to("luisa")
    ......8>     .addE("friend").from("luana").to("sabrina")
    ......9>     .addE("friend").from("luana").to("marcello")
    .....10>     .addE("friend").from("luana").to("mario")
    .....11>     .addE("friend").from("luana").to("lidia")
    .....12>     
    .....12>     .addE("friend").from("sabrina").to("luisa")
    .....13>     .addE("friend").from("sabrina").to("marcello")
    .....14>     .addE("friend").from("sabrina").to("mario")
    .....15>     
    .....15>     .addE("friend").from("mario").to("luisa")
    .....16>     .addE("friend").from("mario").to("marcello")
    .....17>     ).V().as('a').
    .....18>   repeat(both().simplePath()).
    .....19>     times(2).
    .....20>   where(both().as('a')).
    .....21>   path().by(label)
    ==>[j,user,user,user]
    ==>[j,user,user,user]
    ==>[j,user,user,user]
    ==>[j,user,user,user]
    ...
    ==>[j,user,user,user]
    

    You can account for that by naming the path you care about or otherwise limiting or filtering away that initial path element:

    gremlin> g.addV("j").sideEffect(V().drop()).sideEffect(
    ......1>     addV("user").property("name", "luana").as("luana")
    ......2>     .addV("user").property("name", "luisa").as("luisa")
    ......3>     .addV("user").property("name", "sabrina").as("sabrina")
    ......4>     .addV("user").property("name", "marcello").as("marcello")
    ......5>     .addV("user").property("name", "mario").as("mario")
    ......6>     .addV("user").property("name", "lidia").as("lidia")
    ......7>     
    ......7>     .addE("friend").from("luana").to("luisa")
    ......8>     .addE("friend").from("luana").to("sabrina")
    ......9>     .addE("friend").from("luana").to("marcello")
    .....10>     .addE("friend").from("luana").to("mario")
    .....11>     .addE("friend").from("luana").to("lidia")
    .....12>     
    .....12>     .addE("friend").from("sabrina").to("luisa")
    .....13>     .addE("friend").from("sabrina").to("marcello")
    .....14>     .addE("friend").from("sabrina").to("mario")
    .....15>     
    .....15>     .addE("friend").from("mario").to("luisa")
    .....16>     .addE("friend").from("mario").to("marcello")
    .....17>     ).V().as('a').
    .....18>   repeat(both().simplePath()).
    .....19>     times(2).
    .....20>   where(both().as('a')).
    .....21>   path().from('a').
    .....22>   map(unfold().limit(3).order().by(id).dedup().fold()).
    .....23>   dedup().
    .....24>   group('m').
    .....25>     by(limit(local,2)).
    .....26>   group('m').
    .....27>     by(tail(local,2)).
    .....28>   group('m').
    .....29>     by(union(limit(local,1),tail(local,1)).fold()).     
    .....30>   cap('m').
    .....31>   unfold().
    .....32>   map(select(values).unfold().unfold().order().by(id).dedup().fold()).
    .....33>   dedup().
    .....34>   map(unfold().values('name').fold())
    ==>[luana,luisa,sabrina,mario]
    ==>[luana,sabrina,marcello,mario]
    ==>[luana,luisa,sabrina,marcello,mario]
    

    Note line 21 above where we simply add path().from('a') which says, start the path at the step label "a", and then the query starts working again.

    Regarding your other example that doesn't use sideEffect() to add the sample graph data, note the output of path() when it follows repeat():

    gremlin> g.addV("j").sideEffect(V().drop()).
    ......1>   addV("user").property("name", "luana").as("luana").
    ......2>   addV("user").property("name", "luisa").as("luisa").
    ......3>   addV("user").property("name", "sabrina").as("sabrina").
    ......4>   addV("user").property("name", "marcello").as("marcello").
    ......5>   addV("user").property("name", "mario").as("mario").
    ......6>   addV("user").property("name", "lidia").as("lidia").
    ......7>     
    ......7>   addE("friend").from("luana").to("luisa").
    ......8>   addE("friend").from("luana").to("sabrina").
    ......9>   addE("friend").from("luana").to("marcello").
    .....10>   addE("friend").from("luana").to("mario").
    .....11>   addE("friend").from("luana").to("lidia").
    .....12>     
    .....12>   addE("friend").from("sabrina").to("luisa").
    .....13>   addE("friend").from("sabrina").to("marcello").
    .....14>   addE("friend").from("sabrina").to("mario").
    .....15>     
    .....15>   addE("friend").from("mario").to("luisa").
    .....16>   addE("friend").from("mario").to("marcello").
    .....17>   V().as('a').both().path()
    ==>[v[712],v[713],v[715],v[717],v[719],v[721],v[723],e[725][713-friend->715],e[726][713-friend->717],e[727][713-friend->719],e[728][713-friend->721],e[729][713-friend->723],e[730][717-friend->715],e[731][717-friend->719],e[732][717-friend->721],e[733][721-friend->715],e[734][721-friend->719],v[721],v[715]]
    ==>[v[712],v[713],v[715],v[717],v[719],v[721],v[723],e[725][713-friend->715],e[726][713-friend->717],e[727][713-friend->719],e[728][713-friend->721],e[729][713-friend->723],e[730][717-friend->715],e[731][717-friend->719],e[732][717-friend->721],e[733][721-friend->715],e[734][721-friend->719],v[721],v[719]]
    ...
    ==>[v[712],v[713],v[715],v[717],v[719],v[721],v[723],e[725][713-friend->715],e[726][713-friend->717],e[727][713-friend->719],e[728][713-friend->721],e[729][713-friend->723],e[730][717-friend->715],e[731][717-friend->719],e[732][717-friend->721],e[733][721-friend->715],e[734][721-friend->719],v[719],v[721]]
    

    As you've added vertices/edges outside of the sideEffect() they are included in that output. Therefore, simplePath() immediately filters them away as soon as you try to traverse V().as('a')!

    ==>[v[712],v[713],v[715],v[717],v[719],v[721],v[723],e[725][713-friend->715],e[726][713-friend->717],e[727][713-friend->719],e[728][713-friend->721],e[729][713-friend->723],e[730][717-friend->715],e[731][717-friend->719],e[732][717-friend->721],e[733][721-friend->715],e[734][721-friend->719],v[721],v[715]]
    

    See how v[721] appears twice - once for addV() and once for V(). simplePath() sees that you traversed that vertex and returned to it.

    My approach to debugging this (as the answer was not immediately clear) was to first profile() the two traversals and compare the counts of the like sections. I noted where they started to differ which put me in the viscinity of where the problem started. From there I started executing the query up to those steps side-by-side until I noted the difference in output around the Path. You can learn a bit on how to pick apart and debug Gremlin queries here.