I want to filter my graph to only include vertices with less than a threshold (e.g. 50) number of edges, as so:
g.V().filter(bothE().limit(50).count().is(lt(50)))
This gives me the list of vertices that I want to keep.
How can I create a traversal
object which includes only these vertices?
Background
I need to compute the k-hop neighbourhood of every single vertex in a graph which filtering out vertices that have a large number of edges (e.g. <50). The filtered graph has several million edges and vertices.
The first way of doing this that came to mind was to first filter the graph, store the result as a new subgraph, and then iterate over every vertex to find the k-hop neighbourhoods. For a single vertex v
, the k=5-hop neighbourhood code is:
g.V(v).repeat(__.bothE().bothV()).times(5).dedup().toList()
A better way might be to iterate every vertex in the original, unfiltered graph and to ignore edges attached to a high-edge-count vertex, but I'm not so sure how to do this.
Attempt 1:
filtered_edges = g.V().filter(bothE().limit(50).count().is_(lt(50))).outE().toList()
subgraph = g.E(filtered_edges).subgraph('subGraph').cap('subGraph').next()
Unfortunately, when using gremlinpython
an error is thrown (StreamClosedError: Stream is closed
). Running other - maybe less expensive - queries before and after this error appears does not yield similar errors, so the connection to the gremlin shell is still there. The code also works in the gremlin shell (replacing is_
for is
).
I guess this is because I'm sending so much data between the gremlin server and Python, but unsure as to why this would be an issue.
Attempt 2:
Using the gremlin client. I've tried overwriting another traversal object with name l
. However the overwrite operation is failing (l = subgraph.traversal();
).
gremlin_client = client.Client('ws://{}:{}/gremlin'.format('localhost', 8192), 'g', message_serializer=serializer.GraphSONSerializersV3d0())
command = "filtered_edges = g.V().filter(bothE().limit(50).count().is(lt(50))).outE().toList(); subgraph = g.E(filtered_edges).subgraph('subGraph').cap('subGraph').next(); l = subgraph.traversal();"
gremlin_client.submit(command).all().result()
You can either continue your traversal from there:
s.V().filter(bothE().limit(50).count().is(lt(50))).out().has(...)....
or:
List<Vertex> list = s.V().filter(bothE().limit(50).count().is(lt(50))).toList()
s.V(list).out().has(...)....