Search code examples
gremlinjanusgraphgremlin-server

gremlin hasLabel query times out


I have a test graph with less than a million nodes and probably a slightly higher number of edges. I'm using a remote gremlin client to connect to a janusgraph/gremlin-server instance backed by 3 scylla backends.

I have various different labeled nodes i.e url, domain, host and brand. The graph contains mainly url, domain, and host nodes. I have one brand node in this entire graph. The brand node looks like this:

{
    label: brand 
    properties: {
        brand: string
    }
}

I am able to do the following query in 1.5 ms. The brand property has a composite index.

g.V().hasLabel('brand').has('brand','stackoverflow');

The query below hits the 30s timeout. I expect this query to only return only one result based on the data I imported into the graph. I verified by testing with a limit

g.V().hasLabel('brand')

My questions

  • Why does this timeout?
  • Is Janusgraph scanning through all nodes in the graph to try find a single node labeled 'brand'? Is there no default index on labels?
  • Why does the first query execute fine when the first steps for both are the same?

Thank you


Solution

    • Why does this timeout?
    • Is Janusgraph scanning through all nodes in the graph to try find a single node labeled 'brand'? Is there no default index on labels?

    As you have guessed this is likely timing out due to a full graph scan since vertex labels are not indexed in JanusGraph. There is an open issue for this: https://github.com/JanusGraph/janusgraph/issues/283

    • Why does the first query execute fine when the first steps for both are the same?

    In this case I suspect that JanusGraph's optimizer is able to optimize the traversal plan to use the composite index.