Search code examples
scalaneo4jquery-optimization

neo4j create list of edges


I have a stream of pairs , where id is numberic id of the node, and ids is the list of ids of the adjacent nodes. I use this query to upsert nodes from such stream:

WITH ${ids.mkString("[", ",", "]")} as ids
UNWIND ids as u2id
MERGE (u1:User {Id:${id}})
MERGE (u2:User {Id:u2id})
CREATE UNIQUE p = (u1) - [:FRIEND] -> (u2)

And i have index on Id label

CREATE INDEX ON :User(Id)

The length of ids list is about 100-200 in average.

Now there are ~60 millions of nodes and mil. of edges in the database. Speed of upserting is about pairs per second. Neo4j running on dedicated machine with Core i5, 28Gb RAM and 2Tb WD Black.

I wonder how upserting query can be optimized or any tip to improve hardware.


Solution

  • These progressive changes should make the query faster.

    1. Perform MERGE of u1 just once

      By moving the MERGE of u1 before the UNWIND, it will only be executed once (instead of once per u2id value).

      MERGE (u1:User {Id:${id}})
      WITH u1, ${ids.mkString("[", ",", "]")} as ids
      UNWIND ids as u2id
      MERGE (u2:User {Id:u2id})
      CREATE UNIQUE (u1)-[:FRIEND]->(u2)
      
    2. In addition, use MERGE instead of CREATE UNIQUE

      Your relationship creation use case should be satisfiable by MERGE as well as CREATE UNIQUE (since you ensure that both endpoints exist beforehand). In my profiling, I see that MERGE uses fewer DB hits (your mileage may vary, depending on your DB characteristics and neo4j version).

      MERGE (u1:User {Id:${id}})
      WITH u1, ${ids.mkString("[", ",", "]")} as ids
      UNWIND ids as u2id
      MERGE (u2:User {Id:u2id})
      MERGE (u1)-[:FRIEND]->(u2)