Search code examples
mongodbneo4jneo4j-apoc

Create relationships between ids and ids in nested lists using APOC


I'm trying to represent MongoDB data as a graph in Neo4J using the APOC connector, but I can't wrap my head around the correct syntax. My data in mongodb look like below.

{
"_id" : ObjectId("5e88985f788e2ab63ff926d7"),
"role": "member",
"name": "Emmett Brown",
"dob" : "1955-03-19",
"registration_date" : "1985-10-26",
"follows" : []
},
{
"_id" : ObjectId("5e88985f788e2ab63ff926d8"),
"role": "member",
"name": "Marty McFly",
"dob" : "1968-06-09",
"registration_date" : "2015-10-26",
"follows": [{
    "id" : [ObjectId("5e88985f788e2ab63ff926d7")]
}]
},
{
"_id" : ObjectId("5e88985f788e2ab63ff926d9"),
"role": "member",
"name": "Biff Tannen",
"dob" : "1959-04-15",
"registration_date" : "2006-09-15",
"follows": [{
        "id" : [ObjectId("5e88985f788e2ab63ff926d7"), ObjectId("5e88985f788e2ab63ff926d8")]
    }]
}

What I'd like to do is to create a graph in Neo4J that would look like this :

CREATE (Emmett:Person)
CREATE (Marty:Person)
CREATE (Biff:Person)

CREATE
(Marty)-[:FOLLOWS]->(Emmett),
(Biff)-[:FOLLOWS]->(Emmett),
(Biff)-[:FOLLOWS]->(Marty)

So in other words, what I'd like to do is to use each ObjectId within the "follows" key as a destination node. However, since I'm using the ids, I have no idea on how to create my relationships... Here's what I came up with so far :

CALL apoc.mongodb.get('mongodb://localhost:27017', 'database_name', 'user_collection', {}) YIELD value AS person
MERGE (p:Person {name:person.name}) ON CREATE SET p.registration_date = person.registration_date
RETURN p

This allows me to return all my nodes and display them in Neo4J, but I have been trying to get the values of my nodes for the past 2 days, and I just can't do it... So I was thinking maybe any of you guys could be of any help with this ? Thank you in advance !


Solution

  • I don't have a Mongo instance to play with so simulated this with a JSON file - note that I've collapsed the ObjectId bits into just strings, which I think is how Neo4j handles them. You'd need to replace the first line with your call to apoc.mongodb.get

    [{
        "_id" : "5e88985f788e2ab63ff926d7",
        "role": "member",
        "name": "Emmett Brown",
        "dob" : "1955-03-19",
        "registration_date" : "1985-10-26",
        "follows" : []
        },
        {
        "_id" : "5e88985f788e2ab63ff926d8",
        "role": "member",
        "name": "Marty McFly",
        "dob" : "1968-06-09",
        "registration_date" : "2015-10-26",
        "follows": [{
            "id" : ["5e88985f788e2ab63ff926d7"]
        }]
        },
        {
        "_id" : "5e88985f788e2ab63ff926d9",
        "role": "member",
        "name": "Biff Tannen",
        "dob" : "1959-04-15",
        "registration_date" : "2006-09-15",
        "follows": [{
                "id" : ["5e88985f788e2ab63ff926d7", "5e88985f788e2ab63ff926d8"]
            }]
        }
    ]
    

    The following creates People nodes, then runs a second pass that tries to connect them together:

    CALL apoc.load.json("example.json") YIELD value as person
    WITH collect(person) as people
    FOREACH (personDetails in people | 
        MERGE (p: Person { id: personDetails._id }) 
        ON CREATE SET p.registrationDate = personDetails.registrationDate,
                      p.name = personDetails.name
    
    )
    WITH people
    UNWIND people as personDetails
    MATCH (follower: Person { id: personDetails._id })
    UNWIND personDetails.follows as followsRecords
    MATCH (followed: Person) WHERE followed.id in followsRecords.id
    MERGE (follower)-[:FOLLOWS]->(followed)
    

    enter image description here

    We probably want to also create a unique constraint on Person.id, which will speed things up with large datasets as well as prevent weird data issues in case we got our query wrong:

    CREATE CONSTRAINT ON (p:Person) ASSERT p.id IS UNIQUE