I am looking at a kryo file with the following vertices
# Tree Vertices
V(label=tree, properties={treeId:1, treeName:treeA})
V(label=tree, properties={treeId:2, treeName:treeB})
# Root Node Vertices
V(label=node, properties={treeId:1, nodeId:111, nodeType:root})
V(label=node, properties={treeId:2, nodeId:222, nodeType:root})
There are no edges between the vertices labeled as tree and the vertices labeled as node. There are further edges nodes connected to the root nodes but they are irrelevant to this question. I do not want to add any edges as this graph file gets vended to me and I am treating it as read-only.
Now I want to join/project the treeNames into a traversal over the root nodes.
g.V()
.hasLabel('node').has('nodeType', 'root')
.project('nodeId', 'treeId', 'treeName') # return nodeId, treeId, treeName for each root node
.by(values('nodeId'))
.by(values('treeId'))
.by(""" # pseudo-sqlish gremlin to clarify my intent
select treeName
from V().hasLabel('tree')
.where(values('treeid'), eq($thisNode.values('treeId'))
"""
)
In SQL terms I'd say: I want to run a subquery (fully independent sub traversal starting from scratch) and then join it with my outer traversal on a given property. And again: No edge between trees and roots.
WITH
trees as (SELECT treeId, treeName FROM vertices v WHERE v.label = 'tree'),
roots as (SELECT nodeId, treeId FROM vertices v where v.label = 'node')
SELECT roots.nodeId, roots.treeId, trees.treeName
FROM roots
JOIN trees ON (roots.treeId, trees.treeId)
So I am looking for a way to perform a projection based on another traversal + one of the returned vertex properties
You can do it by starting a new traversal inside the project
like this:
g.V().hasLabel('node').
has('nodeType', 'root').as('root').
project('nodeId', 'treeId', 'treeName').
by(values('nodeId')).
by(values('treeId')).
by(coalesce(
V().hasLabel('tree').where(eq('root')).
by('treeId').
values('treeName'),
constant('tree not exist')
))
see the example here: https://gremlify.com/bybp7s9mdia
How abusive is this: Very.
starting a sub-query for each node vertex can be very 'heavy' performance-wise. and it's missing all of the advantages of graph DB if your graph schema doesn't fit your requirement