I have a stream of pairs , where id is numberic id of the node, and ids is the list of ids of the adjacent nodes. I use this query to upsert nodes from such stream:
WITH ${ids.mkString("[", ",", "]")} as ids
UNWIND ids as u2id
MERGE (u1:User {Id:${id}})
MERGE (u2:User {Id:u2id})
CREATE UNIQUE p = (u1) - [:FRIEND] -> (u2)
And i have index on Id label
CREATE INDEX ON :User(Id)
The length of ids list is about 100-200 in average.
Now there are ~60 millions of nodes and mil. of edges in the database. Speed of upserting is about pairs per second. Neo4j running on dedicated machine with Core i5, 28Gb RAM and 2Tb WD Black.
I wonder how upserting query can be optimized or any tip to improve hardware.
These progressive changes should make the query faster.
Perform MERGE
of u1
just once
By moving the MERGE
of u1
before the UNWIND
, it will only be executed once (instead of once per u2id
value).
MERGE (u1:User {Id:${id}})
WITH u1, ${ids.mkString("[", ",", "]")} as ids
UNWIND ids as u2id
MERGE (u2:User {Id:u2id})
CREATE UNIQUE (u1)-[:FRIEND]->(u2)
In addition, use MERGE
instead of CREATE UNIQUE
Your relationship creation use case should be satisfiable by MERGE
as well as CREATE UNIQUE
(since you ensure that both endpoints exist beforehand). In my profiling, I see that MERGE
uses fewer DB hits (your mileage may vary, depending on your DB characteristics and neo4j version).
MERGE (u1:User {Id:${id}})
WITH u1, ${ids.mkString("[", ",", "]")} as ids
UNWIND ids as u2id
MERGE (u2:User {Id:u2id})
MERGE (u1)-[:FRIEND]->(u2)