Search code examples
neo4jcypherneo4j-apoc

Neo4j long lasting query to be split/executed in smaller chunks?


My import.csv creates many nodes and merging creates a huge cartesian product and runs in a transaction timeout since the data has grown so much. I've currently set the transaction timeout to 1 second because every other query is very quick and is not supposed to take any longer than one second to finish.

Is there a way to split or execute this specific query in smaller chunks to prevent a timeout?

Upping or disabling the transaction timeout in the neo4j.conf is not an option because the neo4j service needs a restart for every change made in the config.

The query hitting the timeout from my import script:

 MATCH (l:NameLabel)
 MATCH (m:Movie {id: l.id,somevalue: l.somevalue})
 MERGE (m)-[:LABEL {path: l.path}]->(l);

Nodecounts: 1000 Movie, 2500 Namelabel


Solution

  • You can try installing APOC Procedures and using the procedure apoc.periodic.commit.

    call apoc.periodic.commit("
      MATCH (l:Namelabel)
      WHERE NOT (l)-[:LABEL]->(:Movie)
      WITH l LIMIT {limit}
      MATCH (m:Movie {id: l.id,somevalue: l.somevalue})
      MERGE (m)-[:LABEL {path: l.path}]->(l)
      RETURN count(*)
    ",{limit:1000})
    

    The below query will be executed repeatedly in separate transactions until it returns 0.

    You can change the value of {limit : 1000}.

    Note: remember to install APOC Procedures according the version of Neo4j you are using. Take a look in the Version Compatibility Matrix.