Search code examples
neo4jcypher

How to write a cypher query which recursively traverses a graph in Neo4j to find all the nodes visited


I have created a graph in neo4j community edition running in a docker container in my local using below cypher.

create (a:Account{name:'account.a'}),
(b:Account{name:'account.b'}),
(c:Account{name:'account.c'}),
(d:Account{name:'account.d'}),
(e:Account{name:'account.e'}),
(f:Account{name:'account.f'}),
(x:Account{name:'account.x'}),
(y:Account{name:'account.y'}),
(cas:Calculation{name:'account.a:simple_interest'}),
(cac:Calculation{name:'account.a:compound_interest'}),
(cbs:Calculation{name:'account.b:simple_interest'}),
(cbc:Calculation{name:'account.b:compound_interest'}),
(cxs:Calculation{name:'account.x:simple_interest'}),
(cxc:Calculation{name:'account.x:compound_interest'}),
(a)-[:RESULTS_FROM]->(cas),
(a)-[:RESULTS_FROM]->(cac),
(b)-[:RESULTS_FROM]->(cbs),
(b)-[:RESULTS_FROM]->(cbc),
(x)-[:RESULTS_FROM]->(cxs),
(x)-[:RESULTS_FROM]->(cxc),
(cas)-[:DEPENDS_ON]->(d),
(cas)-[:DEPENDS_ON]->(e),
(cbs)-[:DEPENDS_ON]->(c),
(cbc)-[:DEPENDS_ON]->(y),
(cac)-[:DEPENDS_ON]->(f),
(cac)-[:DEPENDS_ON]->(d),
(cxs)-[:DEPENDS_ON]->(d),
(cxc)-[:DEPENDS_ON]->(d)

There are two kinds of nodes in the graph - Account and Calculation. Account node has a "RESULTS_FROM" relation to Calculation node with the same name + interest type. A Calculation node has a "DEPENDS_ON" relation with 'n' number of other Account nodes.

Below is how the graph looks like.

enter image description here

I need to write a cypher which starts at an Account node and finds all it's dependencies in the entire graph. A dependency in this context is the Calculation node(s) it "RESULTS_FROM" and the Account nodes that those Calculation node(s) "DEPENDS_ON". It should do this recursively in all direction until it finds an Account node which doesn't have any "RESULTS_FROM" relation to any other Calculation nodes.

In the snapshot, if I start at "account.b", it will follow both "RESULTS_FROM" relation (via blue Calculation nodes) which then point to "account.y" and "account.c" via "DEPENDS_ON".

So the output should be - account.y and account.c

Similarly, the output for "account.x" should be "account.d". We stop when any Account doesn't "RESULT_FROM" any Calculation.

I tried different things like below but I am not sure what's the ideal way to do this.

MATCH (startNode:YourLabel {name: 'account.a'})
CALL {
  WITH [startNode] AS nodes, [] AS rels
  UNWIND nodes AS node
  FOREACH(ignoreMe IN [1] |
    MATCH (node)-[r]->(relatedNode)
    WHERE NOT relatedNode IN nodes
    SET nodes = nodes + relatedNode
    SET rels = rels + r
  )
}
RETURN DISTINCT nodes AS reachableNodes

ps- the data here is just a sample and in reality, the graph will have 100s of Account and Calculation nodes. The Relations between the nodes remain the same though. And no Calculation node will depend on a Account node with the same name + interest_type


Solution

  • One option is using APOC. One of the advantages is that it easily gives you only the "end" nodes that you want:

    MATCH(n:Account{name:'account.b'})
    CALL apoc.path.subgraphAll(n, {
        relationshipFilter:"RESULTS_FROM>|DEPENDS_ON>",
        labelFilter: ">Account"
    })
    YIELD nodes
    RETURN nodes
    

    returns:

    ╒════════════════════════════════════════════════════════════════╕
    │nodes                                                           │
    ╞════════════════════════════════════════════════════════════════╡
    │[(:Account {name: "account.y"}), (:Account {name: "account.c"})]│
    └────────────────────────────────────────────────────────────────┘
    

    EDIT: relationshipFilter was changed from > to RESULTS_FROM>|DEPENDS_ON> according to @cybersam's remark