Search code examples
gremlintinkerpoptinkerpop3gremlin-server

Gremlin: Cannot go back to a previous step after calling fold() or count()


This query does not return anything because calling fold() removes all the previous as() stored:

g.V()
.hasLabel('user')
.project("user")
.by(
    as("singleUser")
    .V()
    .fold()
    .choose(
        count(local).is(gt(1)),
        select('singleUser'),
        unfold()
    )
)

Of course I can always solve this kind of problem at the expense of performance, duplicating the cost by searching 2 times, I'm looking for a better solution than that.

Also replacing as() by store() gives a different output, so that also is not a solution. store() survives to fold() but called multiple times with the same string adds each call to a list and as() replaces the first call with the second call, not the same tool.

You can try yourself: https://gremlify.com/tgq24psdfri

EDIT:

An example closer to my real query is this:

g.V()
.hasLabel('user')
.project("u")
.by(
    as("appUser")
    .both("friend")
    .project("result")
    .by(
        as("appUserFriend")
        .choose(
            both("friend").where(bothE('friend').where(bothV().as('appUser'))).count().is(lt(2)),
            constant("too small").fold(),
            union(
                both("friend").where(bothE('friend').where(bothV().as('appUser'))),
                select("appUserFriend")
            ).order().by("name").values("name").fold()
        )
    ).select(values).unfold()
).select(values).unfold().dedup()

This query finds all possible "groups of friends". To form a group of friends each member needs to be friend of at least 2 other friend users (at least a triangle). The query works but also generates groups of 2 total members, that is when the condition of 2 friends is not met so these groups are ignored for being "too small".

You can run the query here: https://gremlify.com/lu64acieuw

The query runs and the output is correct, but notice in line numbers 11 and 14 (in gremlify) the search is the same, to improve performance I want to call select() to go back instead of writing the same search, but it's not possible because of the problem of this question. Any other trick to not writing the same search 2 times is welcome.

This is a description of how it works step by step:

  1. Select all the users of the application, let's call them "appUser"
  2. Select all appUser's friends, let's call them "appUserFriend"
  3. Select the friends of "appUserFriend" that are also friends of "appUser" and add them into an array
  4. Include in the array the "appUserFriend" and the "appUser"
  5. Remove duplicates

Solution

  • I'm going to assume that you don't really care about the "too small" group given the way you wrote your question and that this question is about the algorithm you described in enumeration of its steps at the end. With that assumption in mind I notice that you are basically detecting triangles and then trying to group them accordingly. Cycle detection is discussed in Gremlin Recipes here and the pattern is basically:

    g.V().as("a").repeat(both().simplePath()).times(2).where(both().as("a")).path()
    

    or with duplicated paths removed:

    g.V().as("a").repeat(both().simplePath()).times(2).where(both().as("a")).path().
      dedup().by(unfold().order().by(id).dedup().fold())
    

    With that as a basis you then just need to convert those results into the groups you are looking for. You could likely do that in your own application code outside of Gremlin if you find that more efficient, but one way to do it with Gremlin would involve grouping all the pairs within the triangles and then combining the elements of those paths that grouped:

    g.V().as('a').
      repeat(both().simplePath()).
        times(2).
      where(both().as('a')).
      path().
      map(unfold().limit(3).order().by(id).dedup().fold())
      dedup().
      group('m').
        by(limit(local,2)).
      group('m').
        by(tail(local,2)).
      group('m').
        by(union(limit(local,1),tail(local,1)).fold()).     
      cap('m').
      unfold().
      map(select(values).unfold().unfold().order().by(id).dedup().fold()).
      dedup().
      map(unfold().values('name').fold())
    

    There might yet be a better way but I think this query at least saves you from querying and re-querying the same paths again and again. I also think it's easier to follow, as once the reader notices the triangle counting pattern the rest is free of backtracking and a lot of hierarchy. It would be curious to see if triangle processing into groups was better handled in Gremlin or in your application code. That might be worth exploring.

    I'm not sure how big your graph is, but this particular query might be better suited to OLAP style processing with Spark and a custom VertexProgram, something perhaps similar to connectedComponent().