I want load a CDR csv file with 648000 records to neo4j (4.4.10), But it is about 4 days and And it is not yet completed.
My CSV have 648000 records with 7 columns. and the size of file is about 48 MB. My computer have 100 GB RAM and intel Zeon E5 CPU.
the columns of CSV are:
OP_Name | TP_Name | Called_Number | OP_ANI | Setup_Time | Duration | OP_Price |
---|
the code that I use to load CSV in Neo4j is:
```Cypher
:auto load csv with headers from 'file:///cdr.csv' as line FIELDTERMINATOR ','
with line
where line['Called_Number'] is not null and line['OP_ANI'] is not null
with line['OP_ANI'] as OP_Phone,
(CASE line['OP_Name']
WHEN 'TIC' THEN 'IRAN'
ELSE 'Foreign' END) AS OP_country,
line['Called_Number'] as Called_Phone,
(CASE line['TP_Name']
WHEN 'TIC' THEN 'IRAN'
ELSE 'Foreign' END) AS TP_country,
line['Setup_Time'] as Setup_Time,
line['Duration'] as Duration,
line['OP_Price'] as OP_Price
call {
with OP_Phone, OP_country, Called_Phone, TP_country, Setup_Time, Duration, OP_Price
MERGE (c:Customer{phone: toInteger(Called_Phone)})
on create set c.country = TP_country
WITH c, OP_Phone, OP_country, Called_Phone, TP_country, Setup_Time, Duration, OP_Price
CALL apoc.create.addLabels( c, [ c.country ] ) YIELD node
MERGE (c2:Customer{phone: toInteger(OP_Phone)})
on create set c2.country = OP_country
WITH c2, OP_Phone, OP_country, Called_Phone, TP_country, Setup_Time, Duration, OP_Price, c
CALL apoc.create.addLabels( c2, [ c2.country ] ) YIELD node
MERGE (c2)-[r:CALLED{setupTime: Setup_Time,
duration: Duration,
OP_Price: OP_Price}]->(c)
} IN TRANSACTIONS
```
How can I speed up the load operation?
MERGE
acts as an upsert in Neo4j. So the statement:
MERGE (c:Customer{phone: toInteger(Called_Phone)})
checks if there is a Customer
node with the given phone number is there or not. If it is, it performs the update otherwise creates the node. When there is a large number of nodes, this lookup can be very slow, and CSV import will be slow overall. Creating an index on the phone
property of Customer
should do the trick. You can create the index like this:
CREATE INDEX phone IF NOT EXISTS FOR (n:Customer) ON (n.phone)