Search code examples
neo4jmultiple-matches

Why are results of multiple merges reduced? neo4j


I am trying to import data from a csv. In this csv there are a handful of columns that match up to the name attribute of my node(Node) in each row. I then try to relate those matched nodes to another node(OtherNode). The issue is when I query multiple match statements the number of nodes seems to decrease as I had another, acting like an and not an or.

Can you please explain how you would match multiple sets of nodes from multiple names given in each row? If you could also explain why the number of nodes are reduced as multiple match queries are added.

LOAD CSV WITH HEADERS FROM "file:///MasterNode.csv" AS row
        MATCH(n1:Node{name: row.`Node Name 1`}),
        (n2:Node{name: row.`Node Name 2`}),
        (n3:Node{name: row.`Node Name 3`}),
        (n4:Node{name: row.`Node Name 4`}),
        (n5:Node{name: row.`Node Name 5`}),
        (n6:Node{name: row.`Node Name 6`}),
        (n7:Node{name: row.`Node Name 7`}),
        (n8:Node{name: row.`Node Name 8`}),
        (n9:Node{name: row.`Node Name 9`}),
        (on:OtherNode{name: row.`Other Node Name`})
        MERGE (on)-[:DEPENDS_ON]->(n1)
        MERGE (on)-[:DEPENDS_ON]->(n2)
        MERGE (on)-[:DEPENDS_ON]->(n3)
        MERGE (on)-[:DEPENDS_ON]->(n4)
        MERGE (on)-[:DEPENDS_ON]->(n5)
        MERGE (on)-[:DEPENDS_ON]->(n6)
        MERGE (on)-[:DEPENDS_ON]->(n7)
        MERGE (on)-[:DEPENDS_ON]->(n8)
        MERGE (on)-[:DEPENDS_ON]->(n9)

Solution

  • MATCH is not optional. When you use MATCH, you are asking for existing matches of the pattern in the graph, and if there is no match in the graph, then that row will be wiped out (it does not make sense to return results or continue to process results that don't adhere to what you're looking for).

    If you do not know if the node exists in the graph and want to keep the row even in that case, you can use OPTIONAL MATCH. However, you will not be able to MERGE a relationship using the variable of a null node result. You would need to use a conditional (using the FOREACH trick or APOC's conditional procedures) to do this.

    Usually for these cases the CSV is formatted differently, as a two-column CSV with Other Node Name and Node Name, then the import would look like this:

    LOAD CSV WITH HEADERS FROM "file:///MasterNode.csv" AS row
        MATCH(n:Node{name: row.`Node Name`}),
        (on:OtherNode{name: row.`Other Node Name`})
        MERGE (on)-[:DEPENDS_ON]->(n)
    

    Note that the value for Other Node Name would not be distinct across rows...so if you translated your current CSV to this format, what you had on one row would be transformed to 9 rows (each with the same Other Node Name but a different Node Name. In the case where the node doesn't exist, that row is wiped out and a MERGE is never attempted for that row.

    If you're determined to use your current format CSV, then you need to change up your query. By changing your node names into a list, and performing a MATCH ... WHERE n.name IN the list we can do an index lookup for all of the nodes at once, ignoring any that don't match, but keeping them all under the same variable, then we only need a single MERGE to create the relationship for all of them.

    LOAD CSV WITH HEADERS FROM "file:///MasterNode.csv" AS row
        WITH row, [row.`Node Name 1`, row.`Node Name 2`, row.`Node Name 3`, row.`Node Name 4`, row.`Node Name 5`, row.`Node Name 6`, row.`Node Name 7`, row.`Node Name 8`, row.`Node Name 9`] as nodeNames
        MATCH (on:OtherNode{name: row.`Other Node Name`})
        MATCH(n:Node)
        WHERE n.name IN nodeNames
        MERGE (on)-[:DEPENDS_ON]->(n)