I'm trying to write a Gremlin query that will traverse through several vertices and return the leaves along with some information about the path it followed to get there.
It's easiest to explain with an example:
# Sample graph diagram
# 1 --> 2* --> 3* --> 4
# \ \---> 5* --> 6
# \-> 7
# Create sample graph
g.addV('V').as('1').property('id','1').property('notable',false)
.addV('V').as('2').property('id','2').property('notable',true)
.addE('E').from('1')
.addV('V').as('3').property('id','3').property('notable',true)
.addE('E').from('2')
.addV('V').as('4').property('id','4').property('notable',false)
.addE('E').from('3')
.addV('V').as('5').property('id','5').property('notable',true)
.addE('E').from('2')
.addV('V').as('6').property('id','6').property('notable',false)
.addE('E').from('5')
.addV('V').as('7').property('id','7').property('notable',false)
.addE('E').from('1')
The following traversal starts from vertex 1 and continues out()
as far as possible, collecting "notable" vertices using as()
.
g.V('1')
.out()
.until(out().count().is(0))
.repeat(
optional(has('notable', true).as("notables"))
.out()
)
.project('Id','NotableAncestors')
.by(id())
.by(coalesce(
select('notables').unfold().id(), inject([])
))
What I'd like to see is the ID of each leaf with an array of IDs of its "notable" ancestors:
[
{
"Id": "7",
"NotableAncestors": []
},
{
"Id": "4",
"NotableAncestors": ["2", "3"]
},
{
"Id": "6",
"NotableAncestors": ["2", "5"]
}
]
But, instead of NotableAncestors
being an array, I'm getting just the first value, because unfold()
flattens the array to just the first item in it, as you can see below. Alternately, if I leave out unfold()
, I get an array, but it is always empty.
[
{
"Id": "7",
"NotableAncestors": []
},
{
"Id": "4",
"NotableAncestors": "2"
},
{
"Id": "6",
"NotableAncestors": "2"
}
]
I think you can simplify a bit. First note that as()
is a step label that you can reference to check what traverser is in that step at a particular point in the traversal so it's not really "collecting" things. Here's another way to go:
gremlin> g.V('1').
......1> repeat(out()).
......2> emit(outE().count().is(0)).
......3> project('Id','NotableAncestors').
......4> by(id()).
......5> by(path().unfold().has('notable',true).id().fold())
==>[Id:7,NotableAncestors:[]]
==>[Id:4,NotableAncestors:[2,3]]
==>[Id:6,NotableAncestors:[2,5]]
I removed a bunch of extra steps and simply traversed out()
repeatedly away from vertex "1" emitting only the leaf vertices which are the ones you care about. Then, I just analyze the path()
taken to get to that leaf for any "notable" vertices and add fold those to a List
for "NotableAncestors"`.