Search code examples
neo4jcypherload-csv

How to improve performance of LOAD CSV in NEO4J


I am using community edition of neo4j.I am trying to create 50000 nodes and 93400 relationships using CSV file.But the load csv command in neo4j is taking around 40 mins to create the nodes and relationships. Using py2neo package in python to connect and run cypher queries.Load csv command looks similar to one below:

USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///Sample.csv" AS row WITH row 
MERGE(animal:Animal { name:row.`ANIMAL_NAME`})
ON CREATE SET animal{name:row.`ANIMAL_NAME`,type:row.`TYPE`, status:row.`Status`, birth_date:row.`DATE`}
ON MATCH SET animal +={name:row.`ANIMAL_NAME`,type:row.`TYPE`,status:row.`Status`,birth_date:row.`DATE`}
MERGE (person:Person { name:row.`PERSON_NAME`})
ON CREATE SET person ={name:row.`PERSON_NAME` age:row.`AGE`, address:row.`Address`, birth_date:row.`PERSON_DATE`}
ON MATCH SET person += { name:row.`PERSON_NAME`, age:row.`AGE`, address:row.`Address`, birth_date:row.`PERSON_DATE`}
MERGE (person)-[:OWNS]->(animal);

Infrastructure Details: dbms.memory.heap.max_size=16384M

dbms.memory.heap.initial_size=2048M

dbms.memory.pagecache.size=512M

neo4j_version:3.3.9

How would I get it to work faster.Thanks in advance


Solution

  • Ideally, you should be using the lastest neo4j version, as there have been many performance improvements since 3.3.9. Since you already have indexes on :Animal(name) and :Person(name), the other main issue is probably that the Cypher planner is generating an expensive Eager operation (at least in neo4j 4.0.3) for your query. Whenever you have performance issues, you. should use EXPLAIN or PROFILE to see the operations that the Cypher planner generates.

    Try using this simpler query (which should do the same thing as yours). Using EXPLAIN in neo4j 4.0.3, this query does not use the Eager operation:

    :auto USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///Test.csv" AS row
    MERGE(animal:Animal {name: row.`ANIMAL_NAME`})
    SET animal += {type:row.`TYPE`, status:row.`Status`, birth_date:row.`DATE`}
    MERGE (person:Person { name:row.`PERSON_NAME`})
    SET person += {age:row.`AGE`, address:row.`Address`, birth_date:row.`PERSON_DATE`}
    MERGE (person)-[:OWNS]->(animal);
    

    The :auto command is required in neo4j 4.x when using USING PERIODIC COMMIT.