Im trying to load data to neo4j db from xml file using py2neo
this python script works fine but its too slow since Im adding the nodes first then the relationships with two exceptions handlers. besides that the XML file size is around 200MB.
Im wondering if there is faster way to perform this task?
XML file:
<Persons>
<person>
<id>XA123</id>
<first_name>Adam</first_name>
<last_name>John</last_name>
<phone>01-12322222</phone>
</person>
<person>
<id>XA7777</id>
<first_name>Anna</first_name>
<last_name>Watson</last_name>
<relationship>
<type>Friends</type>
<to>XA123</to>
</relationship>
</person>
</Persons>
python script:
#!/usr/bin/python3
from xml.dom import minidom
from py2neo import Graph, Node, Relationship, authenticate
graph = Graph("http://localhost:7474/db/data/")
authenticate("localhost:7474", "neo4j", "admin")
xml_file = open("data.xml")
xml_doc = minidom.parse(xml_file)
persons = xml_doc.getElementsByTagName('person')
# Adding Nodes
for person in persons:
ID_ = person.getElementsByTagName('id')[0].firstChild.data
fName = person.getElementsByTagName('first_name')[0].firstChild.data
lName = person.getElementsByTagName('last_name')[0].firstChild.data
# not every person has phone number
try:
phone = person.getElementsByTagName('phone')[0].firstChild.data
except IndexError:
phone = "None"
label = "Person"
node = Node(label, ID=ID_, LastName=fName, FirstName=lName, Phone=phone)
graph.create(node)
# Adding Relationships
for person in persons:
ID_ = person.getElementsByTagName('id')[0].firstChild.data
label = "Person"
node1 = graph.find_one(label, property_key="ID", property_value=ID_)
# relationships
try:
has_relations = person.getElementsByTagName('relationship')
for relation in has_relations:
node2 = graph.find_one(label,
property_key="ID",
property_value=relation.getElementsByTagName('to')[0].firstChild.data)
relationship = Relationship(node1,
relation.getElementsByTagName('type')[0].firstChild.data, node2)
graph.create(relationship)
except IndexError:
continue
the time needed to load the data into neo4j has significantly reduced by using unique property constraints for a specific label.
graph.cypher.execute("CREATE CONSTRAINT ON (n:Person) ASSERT n.ID IS UNIQUE")