Search code examples
neo4jcyphergraph-databases

How to avoid duplicate nodes when importing JSON into Neo4J


Let's say I have a JSON containing relationships between people:

{
    [
        {
            "name": "mike",
            "loves": ["karen", "david", "joy"],
            "loved": ["karen", "joy"]
        },
        {
            "name": "karen",
            "loves": ["mike", "david", "joy"],
            "loved": ["mike"]
        },
        {
            "name": "joy",
            "loves": ["karen"],
            "loved": ["karen", "david"]
        }
    ]
}

I want to import nodes and relationships into a Neo4J DB. For this sample, there's only one relationship ("LOVES") and the 2 lists each user has just control the arrow's direction. I use the following query to import the JSON:

UNWIND {json} as person
CREATE (p:Person {name: person.username})
FOREACH (l in person.loves | MERGE (v:Person {name: l}) CREATE (p)-[:LOVES]->(v))
FOREACH (f in person.loved | MERGE (v:Person {name: f}) CREATE (v)-[:LOVES]->(p))

My problem is that I now have duplicate nodes (i.e. 2 nodes with {name: 'karen'}). I know I could probably use UNIQUE if I insert records one at a time. But what should I use here when importing a large JSON? (to be clear: the name property would always be unique in the JSON - i.e., there are no 2 "mikes").


Solution

  • [EDITED]

    Since you cannot assume that a Person node does not yet exist, you need to MERGE your Person nodes everywhere.

    If there is no need to use your loved data (that is, if the loves data is sufficient to create all the necessary relationships):

    UNWIND {json} as person
    MERGE (p:Person {name: person.name})
    FOREACH (l in person.loves | MERGE (v:Person {name: l}) CREATE (p)-[:LOVES]->(v))
    

    On the other hand, if the loved data is needed, then you need to use MERGE when creating the relationships as well (since any relationship might already exist).

    UNWIND {json} as person
    MERGE (p:Person {name: person.name})
    FOREACH (l in person.loves | MERGE (v:Person {name: l}) MERGE (p)-[:LOVES]->(v))
    FOREACH (f in person.loved | MERGE (v:Person {name: f}) MERGE (v)-[:LOVES]->(p))
    

    In both cases, you should create an index (or uniqueness constraint) on :Person(name) to speed up the query.