I have a test graph with less than a million nodes and probably a slightly higher number of edges. I'm using a remote gremlin client to connect to a janusgraph/gremlin-server instance backed by 3 scylla backends.
I have various different labeled nodes i.e url, domain, host and brand. The graph contains mainly url, domain, and host nodes. I have one brand node in this entire graph. The brand node looks like this:
{
label: brand
properties: {
brand: string
}
}
I am able to do the following query in 1.5 ms. The brand property has a composite index.
g.V().hasLabel('brand').has('brand','stackoverflow');
The query below hits the 30s timeout. I expect this query to only return only one result based on the data I imported into the graph. I verified by testing with a limit
g.V().hasLabel('brand')
My questions
Thank you
As you have guessed this is likely timing out due to a full graph scan since vertex labels are not indexed in JanusGraph. There is an open issue for this: https://github.com/JanusGraph/janusgraph/issues/283
In this case I suspect that JanusGraph's optimizer is able to optimize the traversal plan to use the composite index.