Search code examples
graphneo4jcypherproperty-graph

Neo4j MATCH then MERGE too many DB hits


This is the query:

MATCH (n:Client{curp:'SOME_VALUE'}) 
WITH n 
MATCH (n)-[:HIZO]-()-[r:FB]-()-[:HIZO]-(m:Client) 
WHERE ID(n)<>ID(m)
AND NOT (m)-[:FB]->(n) 
MERGE (n)-[:FB]->(m) RETURN m.curp

PROFILE

Why is the Merge stage getting so many DB hits if the query already narrowed down n, m pairs to 6,781 rows?

Details of that stage shows this:

n, m, r
(n)-[ UNNAMED155:FB]->(m)

Solution

  • Keep in mind that queries build up rows, and operations in your query get run on every row that is built up.

    Because the pattern in your match may find multiple paths to the same :Client, it will build up multiple rows with the same n and m (but possibly different r, but as you aren't using r anywhere else in your query, I encourage you to remove the variable).

    This means that even though you mean to MERGE a single relationship between n and a distinct m, this MERGE operation will actually be run for every single duplicate row of n and m. One of those MERGEs will create the relationship, the others will be wasting cycles matching on the relationship that was created without doing anything more.

    That's why we should be able to lower our db hits by only considering distinct pairs of n and m before doing the MERGE.

    Also, since your query made sure we're only considering n and m where the relationship doesn't exist, we can safely use CREATE instead of MERGE, and it should save us some db hits because MERGE always attempts a MATCH first, which isn't necessary.

    An improved query might look like this:

    MATCH (n:Client{curp:'SOME_VALUE'}) 
    WITH n 
    MATCH (n)-[:HIZO]-()-[:FB]-()-[:HIZO]-(m:Client) 
    WHERE n <> m
    AND NOT (m)-[:FB]->(n) 
    WITH DISTINCT n, m
    MERGE (n)-[:FB]->(m) 
    RETURN m.curp
    

    EDIT

    Returning the query to use MERGE for the :FB relationship, as attempts to use CREATE instead ended up not being as performant.