Search code examples
csvimportneo4jneo4j-batch-inserterlarge-data

Neo4j Import Job from CSV with ancestory as the relationship


I have a some data that I want to into insert into neo4j.

Initially I made a script to create all the relationships and nodes as a cql file which worked nicely for smaller amount of data-set but when my data set grew my system crashed.

Keep in mind I was using the neo4j-shell to input all the data.

I know I can batch insert the data with the batch importer but my entire data set has only one table with ancestries which I used to create the relationships.

For example 1=> 1.2, 1.2.1, 1.2.2, 1.3 and so forth.. I converted the data to a .csv and imported it and it worked really nicely and fast I was able to get all the nodes but how do I go about getting the relationships to be created in neo4j with just one table that holds ID,name, and ancestry?


Solution

  • For a large dataset, you need to combine USING PERIODIC COMMIT with LOAD CSV:

    CREATE CONSTRAINT ON (n:Data) ASSERT n.id IS UNIQUE
    
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:///data.csv" AS line
    CREATE (n:Data {id: line.id, name: line.name})
    
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:///data.csv" AS line
    MATCH (n:Data {id: line.id}), (a:Data {id: line.ancestry})
    MERGE (n)-[:HAS_ANCESTOR]->(a)