Search code examples
pythonneo4jpy2neo

Efficient way to create relationships in neo4j


I have a neo4j database populated with thousands of nodes without any relationship defined. I have a file which contains relationships between nodes, so I would like to create relationships between these nodes created in the database. My current approach is:

from py2neo import NodeSelector,Graph,Node,Relationship
graph = Graph('http://127.0.0.1:7474/db/data')
tx = graph.begin()
selector = NodeSelector(graph)
with open("file","r") as relations:
    for line in relations:
        line_split=line.split(";")
        node1 = selector.select("Node",unique_name=line_split[0]).first()
        node2 = selector.select("Node",unique_name=line_split[1]).first()
        rs = Relationship(node1,"Relates to",node2)
        tx.create(rs)
tx.commit()

The current approach needs 2 queries to database in order to obtain nodes to form a relationship + relationship creation. Is there a more efficient way given that nodes currently exist in the database?


Solution

  • You can use some form of node caching while populating relations:

    from py2neo import NodeSelector,Graph,Node,Relationship
    graph = Graph('http://127.0.0.1:7474/db/data')
    tx = graph.begin()
    selector = NodeSelector(graph)
    node_cache = {}
    
    with open("file","r") as relations:
        for line in relations:
            line_split=line.split(";")
    
            # Check if we have this node in the cache
            if line_split[0] in node_cache:
                node1 = node_cache[line_split[0]]
            else:
                # Query and store for later
                node1 = selector.select("Node",unique_name=line_split[0]).first()
                node_cache[line_split[0]] = node1
    
            if line_split[1] in node_cache:
                node2 = node_cache[line_split[1]]
            else:
                node2 = selector.select("Node",unique_name=line_split[1]).first()
                node_cache[line_split[1]] = node2
    
            rs = Relationship(node1,"Relates to",node2)
            tx.create(rs)
    
    tx.commit()
    

    With the above you will only load each node once and only if that node appears in your input file.