I have a neo4j database populated with thousands of nodes without any relationship defined. I have a file which contains relationships between nodes, so I would like to create relationships between these nodes created in the database. My current approach is:
from py2neo import NodeSelector,Graph,Node,Relationship
graph = Graph('')
tx = graph.begin()
selector = NodeSelector(graph)
with open("file","r") as relations:
for line in relations:
node1 = selector.select("Node",unique_name=line_split[0]).first()
node2 = selector.select("Node",unique_name=line_split[1]).first()
rs = Relationship(node1,"Relates to",node2)
The current approach needs 2 queries to database in order to obtain nodes to form a relationship + relationship creation. Is there a more efficient way given that nodes currently exist in the database?
You can use some form of node caching while populating relations:
from py2neo import NodeSelector,Graph,Node,Relationship
graph = Graph('')
tx = graph.begin()
selector = NodeSelector(graph)
node_cache = {}
with open("file","r") as relations:
for line in relations:
# Check if we have this node in the cache
if line_split[0] in node_cache:
node1 = node_cache[line_split[0]]
# Query and store for later
node1 = selector.select("Node",unique_name=line_split[0]).first()
node_cache[line_split[0]] = node1
if line_split[1] in node_cache:
node2 = node_cache[line_split[1]]
node2 = selector.select("Node",unique_name=line_split[1]).first()
node_cache[line_split[1]] = node2
rs = Relationship(node1,"Relates to",node2)
With the above you will only load each node once and only if that node appears in your input file.