Search code examples
performanceneo4jcypherquery-optimization

Neo4j 3.5 Query Performance Issue


I have following query running in neo4j (community 3.5)

MATCH (a:master_node:PERF:Application)-[r1]->(n:master_node:PERF)-[r:AFFINITY]->(m:master_node:PERF)<-[r2]-(a1:master_node:PERF:Application)
WHERE exists(n.latest_ingestion)
  AND exists(m.latest_ingestion)
  AND id(a) <> id(a1)
MERGE (a)<-[:APP_AFFINITY]-(a1)

and my configurations for neo4j are as follows:

heap_size : 8GB
page_cache : 4GB

and I have Indexes for the label(Application) on property(name) and above query running over 100k nodes.But the Query is running for longer time and consuming so much of memory.

Please help me out to improve the performance.


Solution

  • You're not using the name property in this query, so your index won't help. The only indexes that may help would be on :master_node(latest_ingestion) or :PERF(latest_ingestion), that may change the query from using label scans to index scans, depending on db statistics.

    Also, you may want to consider batching these updates, likely using apoc.periodic.iterate() from APOC procedures. Something like:

    CALL apoc.periodic.iterate("
      MATCH (a:master_node:PERF:Application)-->(n:master_node:PERF)-[:AFFINITY]->(m:master_node:PERF)<--(a1:master_node:PERF:Application)
      WHERE exists(n.latest_ingestion)
        AND exists(m.latest_ingestion)
        AND id(a) <> id(a1)
      RETURN a, a1",
      "MERGE (a)<-[:APP_AFFINITY]-(a1)", 
      {}) YIELD batches, total, errorMessages
    RETURN batches, total, errorMessages