Search code examples
neo4jcypherload-csv

Efficient way to import multiple csv's in neo4j


I am working on creating a graph database in neo4j for a CALL dataset. The dataset is stored in csv file with following columns: Source, Target, Timestamp, Duration. Here Source and Target are Person id's (numeric), Timestamp is datetime and duration is in seconds (integer).

I modeled my graph where person are nodes(person_id as property) and call as relationship (time and duration as property). There are around 2,00,000 nodes and around 70 million relationships. I have a separate csv files with person id's which I used to create the nodes. I also added uniqueness constraint on the Person id's.

CREATE CONSTRAINT ON ( person:Person ) ASSERT (person.pid) IS UNIQUE

I didn't completely understand the working of bulk import so I wrote a python script to split my csv into 70 csv's where each csv has 1 million nodes (saved as calls_0, calls_1, .... calls_69). I took the initiative to manually run a cypher query changing the filename every time. It worked well(fast enough) for first few(around 10) files but then I noticed that after adding relationship from a file, the import is getting slower for the next file. Now it is taking almost 25 minutes for importing a file. Can someone link me to an efficient and easy way of doing it?

Here is the cypher query:

:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///calls/calls_28.csv' AS line
WITH toInteger(line.Source) AS Source, 
datetime(replace(line.Time,' ','T')) AS time,
toInteger(line.Target) AS Target,
toInteger(line.Duration) AS Duration
MATCH (p1:Person {pid: Source})
MATCH (p2:Person {pid: Target})
MERGE (p1)-[rel:CALLS {time: time, duration: Duration}]->(p2)
RETURN count(rel)

I am using Neo4j 4.0.3


Solution

  • Your MERGE clause has to check for an existing matching relationship (to avoid creating duplicates). If you added a lot of relationships between Person nodes, that could make the MERGE clause slower.

    You should consider whether it is safe for you to use CREATE instead of MERGE.