I try to find an efficient gremlin query that returns a traversal with the vertex and the number of outgoing edges. Or even better instead of the number of outgoing edges a boolean value if outgoing edges exist or not.
Background: I try to improve the performance of a program that writes some properties on the vertices and then iterates over the outgoing edges to remove some of it. In a lot of cases there are no outgoing edges and the iteration
for (Iterator<Edge> iE = v.edges(Direction.OUT); iE.hasNext();) { ... }
consumes a significant part of the runtime. So instead of resolving the ids to vertices (with gts.V(ids)
I want to collect the information about the existence of outgoing edges to skip the iteration, if possible.
My first try was:
gts.V(ids).as("v").choose(__.outE(), __.constant(true), __.constant(false)).as("e").select("v", "e");
Second idea was:
gts.V(ids).project("v", "e").by().by(__.outE().count());
Both seem to work, but is there a better solution that does not require the underlying graph implementation to fetch or count all edges?
(We currently use the sqlg implementation of tinkerpop/gremlin with Postgresql and both queries seem to fetch all outgoing edges from Postgresql. This may be a case where some optimization is missing. But my question is not sqlg specific.)
If you only need to know whether edges exist or not then you should limit()
results in the by()
modulator:
gremlin> g.V().project('v','e').by().by(outE().limit(1).count())
==>[v:v[1],e:1]
==>[v:v[2],e:0]
==>[v:v[3],e:0]
==>[v:v[4],e:1]
==>[v:v[5],e:0]
==>[v:v[6],e:1]
In this way you don't count all of the edges, just the first which is enough to answer your question. You can do true
and false
if you like with a minor modification:
gremlin> g.V().project('v','e').by().by(coalesce(outE().limit(1).constant(true),constant(false)))
==>[v:v[1],e:true]
==>[v:v[2],e:false]
==>[v:v[3],e:false]
==>[v:v[4],e:true]
==>[v:v[5],e:false]
==>[v:v[6],e:true]