Search code examples
gremlingraph-databasestinkerpopjanusgraph

Efficient way for filtering related nodes connected to a supernode


I am trying to filter connected nodes to a super node in Janus Graph where millions of vertices can be connected to a super node directly at level =1, suppose there is a parent node, it could have millions of directly connected vertices with different categories say category1, category2 up to category n.

I am using JanusGraph 1.0 version with backend store as Apache Cassandra and using the below gremlin query to get the related items with category say category 1, the query is taking more than 20-30 seconds to return the result.

For the below recId 123 there are more than 500K records are connected directly to this super node

g.V().has("recId","123").both("related").has("category","category1").range(0,1000).valueMap("recId","category").toList();

Is there any way to optimize this gremlin query or any way so that this query run parallelly and give results in few seconds? Please suggest

Just for a note indexes are already created for recId and category fields.


Solution

  • In general, it is not a good idea to retrieve a supernode into client memory, because then you also get all its edges into client memory. However, once you know the id of the supernode it is possible to traverse to some node and check if it connects to a supernode. If you really want to traverse through the supernode, please check https://docs.janusgraph.org/schema/index-management/index-performance/#vertex-centric-indexes .