With the following graph:
g.addV("entity").property(id, "A")
.addV("entity").property(id, "B")
.addV("entity").property(id, "C")
.addV("entity").property(id, "D")
.addV("entity").property(id, "E")
.addV("entity").property(id, "F")
.addV("entity").property(id, "G")
.addV("entity").property(id, "H")
.addE("needs").from(g.V("A")).to(g.V("B"))
.addE("needs").from(g.V("A")).to(g.V("C"))
.addE("needs").from(g.V("A")).to(g.V("D"))
.addE("needs").from(g.V("A")).to(g.V("E"))
.addE("needs").from(g.V("C")).to(g.V("F"))
.addE("needs").from(g.V("D")).to(g.V("F"))
.addE("needs").from(g.V("E")).to(g.V("G"))
.addE("needs").from(g.V("F")).to(g.V("H"))
One variant of a correct order would be
[[B, H, G], [F, E], [C, D], [A]]
or just
[B, H, G, F, E, C, D, A]
The order within a "group" (e.g. [B, H, G] vs [H, G, B]) is not important.
How do I retrieve the correct order to resolve these nodes if they were to live in a dependency graph?
My thinking of the algorithm would be along the lines of one of these two solutions:
current
and resolved
.current
and see if all of their children (the vertices they reference to) have are contained within resolved
. If they are, save the current vertex into resolved
and append to result set. Save the vertex' parents into current
.current
has elements, go to 2, else finish.resolved
.resolved
and that do not have any referenced vertices within resolved
. Append those to the result set. If there are no vertices, return result set, else repeat 2.Is this possible? If so, how? Can I use these current
and resolved
collections while traversing?
In a large graph this is not likely to be an efficient query but here is one solution. Note that the list after the initial set is in the reverse order from your question as that is the order in which those vertices are encountered. A more efficient approach would be perhaps to reverse the "tree" and to start from known roots rather than "all leafs".
gremlin> g.V().where(__.not(out())).store('a').
......1> repeat(__.in()).emit().
......2> until(__.not(__.in())).
......3> dedup().
......4> union(cap('a'),identity()).
......5> fold()
==>[[v[A1],v[A4]],v[A0],v[A3],v[A2]]
UPDATED
Given your new data set I had to take a different approach. My prior query was using a breadth first search but that does not yield the ordering that you need. I changed the approach to use the index
step and for each vertex found only keep the one found with the highest index value in a path. The resultant list is the one you need, just in reverse order. We can also add to the query to reverse the list - shown further below.
gremlin> g.V().where(__.not(out())).
......1> repeat(__.in()).
......2> until(__.not(__.in())).path().index().unfold().
......3> order().by(tail(local),desc).limit(local,1).dedup().fold()
==>[v[A],v[C],v[D],v[E],v[F],v[B],v[G],v[H]]
If you want to reverse the list as part of the query you can also do that.
gremlin> g.V().where(__.not(out())).
......1> repeat(__.in()).
......2> until(__.not(__.in())).path().index().unfold().
......3> order().by(tail(local),desc).limit(local,1).dedup().fold().index().
......4> unfold().order().by(tail(local),desc).limit(local,1).fold()
==>[v[H],v[G],v[B],v[F],v[E],v[D],v[C],v[A]]