I'm very new to Neo4j and I'm having difficulty when trying to create a large number of relationships (~3.5m) between roughly 2.5m nodes. I am trying to run the query below using the web-based interface to Neo4j on localhost. The query runs, but after about 5 minutes I get a message on the browser saying "Disconnected", other times I get a message complaining about java heap. I'm not sure if my query is unsuitable for what i'd like to do, or just inefficient, or maybe i'm asking too much of the database? Can anyone advise on how I can create the relationships between my nodes?
Nodes structure is very basic: Customer: CSN, Location, Gender
The file i'm using for the relationships is also very simple; Relationships: SourceCSN, DestCSN
using periodic commit 100
load csv with headers from "file:///c:/datafiles/InterCustomerRelationships.csv" as csvLine
MATCH (from:Customer {CSN: csvLine .SourceCSN}), (to:Customer {CSN: csvLine.DestCSN})
CREATE (from)-[:PAID]->(to)
I'm using Neo4J 2.1.1 on windows 7 with 8gb RAM.
Thanks in advance for any help / advice - anything will be gratefully received.
First of all, make sure that you have indexes on the property CSN of the Customer Label. This can be a simple index or I guess in your case a unique constraint.
Secondly, I propose to run first a small import of 10 lines and analyse what the execution plan looks like. For this you can run the same import and limit the set (make first a backup of your db), this should be done in the neo4j-shell :
PROFILE
load csv with headers from "file:///c:/datafiles/InterCustomerRelationships.csv" as csvLine
WITH csvLine
LIMIT 10
MATCH (from:Customer {CSN: csvLine .SourceCSN}), (to:Customer {CSN: csvLine.DestCSN})
CREATE (from)-[:PAID]->(to)