Search code examples
neo4jcypherneo4j-apoc

How to make multi level path traversal faster in Neo4j


I want to find all the paths from the leaf node(E) to the root node(A). (Not for any specific node, so no id or filed filter here)

The data model is as shown in the image.

I used a basic Cypher query to find the paths (A to E):

MATCH path=(:A)-[:USE*]->(:E) RETURN path

It keeps running and never finishes.

I tried to get the paths from C to E using:

MATCH path=(:C)-[:USE*]->(:E) RETURN path

This is query takes up to 18 seconds to return 18k paths. I have tried USING SCAN but no improvements in the time.

How can I improve this traversal to return results in less time? I need to get the results in 4-5 seconds.

System Configurations:

System memory is 32GB

Storage: SSD

Current Neo4j Conf:

heap size: initial-12GB max-12GB

cache: 12GB

Database size: 1.6GB

Data Model

Query Profile


Solution

  • If your DB paths have a lot of variability (e.g., C nodes are not always followed by D nodes), but you know that you always want a specific path pattern (e.g., A->B->C->D->E), then specifying an explicit pattern should be much faster:

    MATCH path=(:A)-[:USE]->(:B)-[:USE]->(:C)-[:USE]->(:D)-[:USE]->(:E)
    RETURN path
    

    Variable-length path patterns are expensive, as they have exponential complexity (based on the depth of the path).

    [UPDATE]

    Even when the DB paths you are interested in do not have any variability (e.g., the path from A to E always looks like A->B->C->D->E), but there are longer paths that include those paths (e.g., if E nodes have long outgoing USE paths), then a variable-length path pattern with no upper bound will force neo4j to test all those outgoing paths. In this case, if you still wanted to use a variable-length path pattern, then you should specify a fixed upper bound, since you know the exact length of the paths of interest (4, in this example):

    MATCH path=(:A)-[:USE*..4]->(:E) RETURN path