Search code examples
gremlinamazon-neptune

Writing an "outer join" query in Gremlin for AWS Neptune (without using lambda steps)


I want to return a user's distinct git branches based on following conditions:

  • Do not return the master branch
  • Return the branches with no associated pull requests
  • Return the branches associated with pull requests when none of the pull requests are open or merged.

So considering the following sample data,

user = graph.addVertex(label, 'User', 'name', 'John')
branch1 = graph.addVertex(label, 'Branch', 'name', 'branch1')
branch2 = graph.addVertex(label, 'Branch', 'name', 'branch2')
branch3 = graph.addVertex(label, 'Branch', 'name', 'branch3')
branchmaster = graph.addVertex(label, 'Branch', 'name', 'master')
user.addEdge('AUTHOR_OF', branch1)
user.addEdge('AUTHOR_OF', branch2)
user.addEdge('AUTHOR_OF', branch3)
user.addEdge('AUTHOR_OF', branchmaster)
pr2 = graph.addVertex(label, 'PullRequest', 'name', 'pr2', 'state', 'OPEN')
pr3 = graph.addVertex(label, 'PullRequest', 'name', 'pr3', 'state', 'DECLINED')
branch2.addEdge('SOURCE_OF', pr2)
branch3.addEdge('SOURCE_OF', pr3)
pr22 = graph.addVertex(label, 'PullRequest', 'name', 'pr22', 'state', 'MERGED')
branch2.addEdge('SOURCE_OF', pr22)
pr23 = graph.addVertex(label, 'PullRequest', 'name', 'pr23', 'state', 'DECLINED')
branch2.addEdge('SOURCE_OF', pr23)

I want to return branch1 (because no associated PRs) and branch3 (because associated PR is declined)

The following query does not work on AWS Neptune because Neptune does not support lambda steps:

g.V().hasLabel('User')
  .out('AUTHOR_OF')
  .hasLabel('Branch')
  .has('name', neq('master'))
  .where(out('SOURCE_OF')
    .hasLabel('PullRequest').values('state').fold()
    .filter{ !(it.get().contains('OPEN') || it.get().contains('MERGED')) })
  .dedup()
  .order().by('updated_at', desc)

Solution

  • You can use the within predicate instead:

    g.V().hasLabel('User')
      .out('AUTHOR_OF')
      .hasLabel('Branch')
      .has('name', neq('master'))
      .where(__.not(out('SOURCE_OF').hasLabel('PullRequest').has('state', within(['OPEN','MERGED']))))
      .dedup()
      .order().by('updated_at', desc)