Search code examples
gremlintinkerpoptinkerpop3gremlin-server

Gremlin: dedup() with groups of vertices not working


I have a query that returns groups of users like this:

==>[britney,ladygaga,aguilera]
==>[aguilera,ladygaga,britney]

These 2 example groups have the same items in a different order, the problem is that dedup() does not remove one of the groups in this case, because having the items in different order makes them different for dedup.

The only solution I can think of is to call order() in each group so they have the same order and dedup() works. But this solution means:

  1. Extra computation just because dedup cannot handle this situation
  2. An ugly comment I have to add like "This is here to make dedup work"

Is there another solution to this?

You can try my example above in the gremlin console with these lines:

g.addV("user").property("name", "britney")
g.addV("user").property("name", "aguilera")
g.addV("user").property("name", "ladygaga")

Dedup working:

g.V().hasLabel("user").values("name").fold().store("result").V().hasLabel("user").values("name").fold().store("result").select("result").unfold().dedup()

Dedup not working because the items are shuffled:

g.V().hasLabel("user").values("name").order().by(shuffle).fold().store("result").V().hasLabel("user").values("name").order().by(shuffle).fold().store("result").select("result").unfold().dedup()

Solution

  • You have to order() the lists for them to have equality:

    gremlin> g.V().hasLabel("user").values("name").order().by(shuffle).fold().store("result").
    ......1>   V().hasLabel("user").values("name").order().by(shuffle).fold().store("result").
    ......2>   select("result").unfold().order(local).dedup()
    ==>[aguilera,britney,ladygaga]
    

    which is standard list equality:

    gremlin> [1,2,3] == [1,2,3]
    ==>true
    gremlin> [1,2,3] == [3,2,1]
    ==>false