Search code examples
gremlinjanusgraphgremlin-server

Create new traversal object from a list of vertices


I want to filter my graph to only include vertices with less than a threshold (e.g. 50) number of edges, as so:

g.V().filter(bothE().limit(50).count().is(lt(50)))

This gives me the list of vertices that I want to keep.

How can I create a traversal object which includes only these vertices?

Background

I need to compute the k-hop neighbourhood of every single vertex in a graph which filtering out vertices that have a large number of edges (e.g. <50). The filtered graph has several million edges and vertices.

The first way of doing this that came to mind was to first filter the graph, store the result as a new subgraph, and then iterate over every vertex to find the k-hop neighbourhoods. For a single vertex v, the k=5-hop neighbourhood code is:

g.V(v).repeat(__.bothE().bothV()).times(5).dedup().toList()

A better way might be to iterate every vertex in the original, unfiltered graph and to ignore edges attached to a high-edge-count vertex, but I'm not so sure how to do this.

Attempt 1:

filtered_edges = g.V().filter(bothE().limit(50).count().is_(lt(50))).outE().toList()
subgraph = g.E(filtered_edges).subgraph('subGraph').cap('subGraph').next()

Unfortunately, when using gremlinpython an error is thrown (StreamClosedError: Stream is closed). Running other - maybe less expensive - queries before and after this error appears does not yield similar errors, so the connection to the gremlin shell is still there. The code also works in the gremlin shell (replacing is_ for is).

I guess this is because I'm sending so much data between the gremlin server and Python, but unsure as to why this would be an issue.

Attempt 2:

Using the gremlin client. I've tried overwriting another traversal object with name l. However the overwrite operation is failing (l = subgraph.traversal();).

gremlin_client = client.Client('ws://{}:{}/gremlin'.format('localhost', 8192), 'g', message_serializer=serializer.GraphSONSerializersV3d0())


command = "filtered_edges = g.V().filter(bothE().limit(50).count().is(lt(50))).outE().toList(); subgraph = g.E(filtered_edges).subgraph('subGraph').cap('subGraph').next(); l = subgraph.traversal();"
gremlin_client.submit(command).all().result()

Solution

  • You can either continue your traversal from there:

    s.V().filter(bothE().limit(50).count().is(lt(50))).out().has(...)....
    

    or:

    List<Vertex> list = s.V().filter(bothE().limit(50).count().is(lt(50))).toList()
    s.V(list).out().has(...)....