Search code examples
csvneo4jcyphergraph-databasesedges

Neo4J - creating large amount of relationships (3.6million)


I'm very new to Neo4j and I'm having difficulty when trying to create a large number of relationships (~3.5m) between roughly 2.5m nodes. I am trying to run the query below using the web-based interface to Neo4j on localhost. The query runs, but after about 5 minutes I get a message on the browser saying "Disconnected", other times I get a message complaining about java heap. I'm not sure if my query is unsuitable for what i'd like to do, or just inefficient, or maybe i'm asking too much of the database? Can anyone advise on how I can create the relationships between my nodes?

Nodes structure is very basic: Customer: CSN, Location, Gender

The file i'm using for the relationships is also very simple; Relationships: SourceCSN, DestCSN

using periodic commit 100 
load csv with headers from "file:///c:/datafiles/InterCustomerRelationships.csv" as csvLine 
MATCH (from:Customer {CSN: csvLine .SourceCSN}), (to:Customer {CSN: csvLine.DestCSN})
CREATE (from)-[:PAID]->(to)

I'm using Neo4J 2.1.1 on windows 7 with 8gb RAM.

Thanks in advance for any help / advice - anything will be gratefully received.


Solution

  • First of all, make sure that you have indexes on the property CSN of the Customer Label. This can be a simple index or I guess in your case a unique constraint.

    Secondly, I propose to run first a small import of 10 lines and analyse what the execution plan looks like. For this you can run the same import and limit the set (make first a backup of your db), this should be done in the neo4j-shell :

    PROFILE
    load csv with headers from "file:///c:/datafiles/InterCustomerRelationships.csv" as csvLine 
    WITH csvLine
    LIMIT 10
    MATCH (from:Customer {CSN: csvLine .SourceCSN}), (to:Customer {CSN: csvLine.DestCSN})
    CREATE (from)-[:PAID]->(to)