I want to find all the paths from the leaf node(E) to the root node(A). (Not for any specific node, so no id or filed filter here)
The data model is as shown in the image.
I used a basic Cypher query to find the paths (A to E):
MATCH path=(:A)-[:USE*]->(:E) RETURN path
It keeps running and never finishes.
I tried to get the paths from C to E using:
MATCH path=(:C)-[:USE*]->(:E) RETURN path
This is query takes up to 18 seconds to return 18k paths. I have tried USING SCAN but no improvements in the time.
How can I improve this traversal to return results in less time? I need to get the results in 4-5 seconds.
System Configurations:
System memory is 32GB
Storage: SSD
Current Neo4j Conf:
heap size: initial-12GB max-12GB
cache: 12GB
Database size: 1.6GB
If your DB paths have a lot of variability (e.g., C
nodes are not always followed by D
nodes), but you know that you always want a specific path pattern (e.g., A->B->C->D->E
), then specifying an explicit pattern should be much faster:
MATCH path=(:A)-[:USE]->(:B)-[:USE]->(:C)-[:USE]->(:D)-[:USE]->(:E)
RETURN path
Variable-length path patterns are expensive, as they have exponential complexity (based on the depth of the path).
[UPDATE]
Even when the DB paths you are interested in do not have any variability (e.g., the path from A
to E
always looks like A->B->C->D->E
), but there are longer paths that include those paths (e.g., if E
nodes have long outgoing USE
paths), then a variable-length path pattern with no upper bound will force neo4j to test all those outgoing paths. In this case, if you still wanted to use a variable-length path pattern, then you should specify a fixed upper bound, since you know the exact length of the paths of interest (4
, in this example):
MATCH path=(:A)-[:USE*..4]->(:E) RETURN path