Search code examples
neo4jneo4jclient

Looping through nodes in Neo4j and creating relationships


This is an extension from another SO (Neo4j 2.0 Merge with unique constraints performance bug?), but I'm trying it a different way.

MATCH (c:Contact),(a:Address), (ca:ContactAddress)
WITH c,a,collect(ca) as matrix
FOREACH (car in matrix | 
MERGE 
(c {ContactId:car.ContactId})
-[r:CONTACT_ADDRESS {ContactId:car.ContactId,AddressId:car.AddressId}]->
(a {AddressId:car.AddressId}))

So this leads to a locked up Neo4j server. I'm trying to wrap my head around why.
My thought process behind the query is the following:

  • I want to select all Contact and Address nodes (as well as ContactAddress nodes)
  • I want to loop through all ContactAddress nodes (which contain the relationship data between Contact and Address) and related the Contact and Address nodes to each other.

When I run the above code, the server sits at about 40% CPU and memory continues to climb. I stopped it after the browser connected disconnected (myserver:7474/browser), reset my database and tried again, this time using the following:

match (c:Contact),(a:Address), (ca:ContactAddress)
WITH c,a,collect(distinct ca) as matrix
foreach (car in matrix | 
CREATE 
(c {ContactId:car.ContactId})
-[r:CONTACT_ADDRESS {ContactId:car.ContactId,AddressId:car.AddressId}]->
(a {AddressId:car.AddressId}))

Same results. Locked up, disconnected Neo4j database while CPU stays pegged and RAM usage continues to climb. Is there a loop here that I'm not seeing?

I've also tried this (with the same hang):

FOREACH(row in {PassedInList} | 
    MERGE (c:Contact {ContactId:row.ContactId})
    MERGE (a:Address {AddressId:row.AddressId})
    MERGE (c)-[r:CONTACT_ADDRESS]->(a)
    )

RESOLVED:

MATCH (ca:ContactAddress)
MATCH (c:Contact {ContactId:ca.ContactId}), (a:Address {AddressId:ca.AddressId})
MERGE p = (c)
          -[r:CONTACT_ADDRESS {ContactId:ca.ContactId,AddressId:ca.AddressId}]->
          (a)

Solution

  • When you write match (c:Contact),(a:Address), (ca:ContactAddress), with 3 disconnected nodes, then Neo4j will match every possible cartesian product of those 3. If you had 100 of each type of node, that is 100x100x100 = 1000000 results.

    Try this:

    MATCH (ca:ContactAddress), (c:Contact {ContactId:ca.ContactId}), (a:Address {AddressId:ca.AddressId})
    MERGE (c)-[r:CONTACT_ADDRESS {ContactId:ca.ContactId,AddressId:ca.AddressId}]->(a)
    

    That will match every :ContactAddress node, and only the :Contact and :Address nodes that match it. Then it'll create the relationship (if it didn't already exist).

    If you want to be clearer, you could also split the MATCH, ie:

    MATCH (ca:ContactAddress)
    MATCH (c:Contact {ContactId:ca.ContactId}), (a:Address {AddressId:ca.AddressId})
    MERGE (c)-[r:CONTACT_ADDRESS {ContactId:ca.ContactId,AddressId:ca.AddressId}]->(a)