Search code examples
neo4jcypherpy2neo

bulk write in py2neo from a list


I have the following data, which represents the distances between two objects.

data = [[('123','234'), 10],
        [('134','432'), 12],

       ]

I would like to insert this into neo4j via py2neo v3:

for e, p in enumerate(data):
    #
    id_left  = p[0][0]
    id_right = p[0][1]    
    distance = p[1]
    #
    left  = Node("_id", id_left)
    right = Node("_id", id_right)
    G.merge(left)
    G.merge(right)
    r = Relationship(left,'TO', right, distance=distance)
    G.create(r)
    #

But I find this to be very, very slow. What's the best of speeding this up? I've looked around but haven't found any code example that illustrates clearly how to go about it


Solution

  • Apparently you are using wrongly py2neo to create nodes, you current code produce the following :

    enter image description here

    As you can see, the first parameter you give to the Node object are the label and the second parameter should be a map of properties.

    This is slow because MERGE has nothing to match on.

    This is a corrected version of your code that will use a label MyNode and a property id :

    from py2neo import Graph, Node, Relationship
    graph = Graph(password="password")
    
    data = [
    
        [('123','234'), 10],
        [('134','432'), 12],
    ]
    
    
    for e, p in enumerate(data):
        #
        id_left  = p[0][0]
        id_right = p[0][1]    
        distance = p[1]
        #
        left  = Node("MyNode", id=id_left)
        right = Node("MyNode", id=id_right)
        graph.merge(left)
        graph.merge(right)
        r = Relationship(left,'TO', right, distance=distance)
        graph.create(r)
    

    Which will produce the following graph :

    enter image description here

    For most performance when you start to have thousands of MyNode nodes, you can add a unique constraint on the id property :

    CREATE CONSTRAINT ON (m:MyNode) ASSERT m.id IS UNIQUE;
    

    Now this code is making 3 calls to Neo4j, the most performant is to use cypher directly :

    data = [
    
        [('123','234'), 10],
        [('134','432'), 12],
    ]
    
    
    params = []
    for x in data:
        params.append({"left": x[0][0], "right": x[0][1], "distance": x[1] })
    
    
    q = """
    UNWIND {datas} AS data
    MERGE (m:MyNode {id: data.left })
    MERGE (m2:MyNode {id: data.right })
    MERGE (m)-[r:TO]->(m2)
    SET r.distance = data.distance
    """
    
    graph.run(q, { "datas": params })
    

    Which will result in the same graph as above.