I'm inserting hierarchical data made of a DOM Tree into a graph database but, I'm not able to obtain the parent's ID which is needed to create a relationship between the child and its parent's id.
Below is the code that illustrates a traversing of DOM nodes, inserting the tags and obtaining the last inserted id. I need to insert and obtain both ids of the child and parent in order to create their relation.
from lxml import HTML
import age # from AgensGraph
from age.gen.ageParser import *
GRAPH_NAME = "demo_graph"
DSN = "host=localhost port=5432 dbname=demodb user=userdemo
password=demo234"
ag = age.connect(graph=GRAPH_NAME, dsn=DSN)
tree = html.parse("demo.html")
for element in tree.getiterator():
if parent := element.getparent():
parent = None
cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))
b = [x[0].id for x in cursor] # get last inserted ID
print(b[0])
ag.execCypher("MATCH (c:node), (p:node) WHERE c.id = %s AND p.id = %s CREATE (a)-[r:connects}]->(b)") # Match child node 'c', parent node: p and join C Connects P (P is unknown)
Here is the demo file: demo.html
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8"/>
<title>Document</title>
</head>
<body>
<ul class="menu">
<div class="itm">home</div>
<div class="itm">About us</div>
<div class="itm">Contact us</div>
</ul>
<div id="idone" class="classone">
<li class="item1">First</li>
<li class="item2">Second</li>
<li class="item3">Third</li>
<div id="innerone"><h1>This Title</h1></div>
<div id="innertwo"><h2>Subheads</h2></div>
</div>
<div id="second" class="below">
<div class="inner">
<h1>welcome</h1>
<h1>another</h1>
<h2>third</h2>
</div>
</div>
</body>
</html>
Here is the extracted DOM Tree:
tag: head attrib: None parent: html
tag: meta attrib: ('charset', 'UTF-8') parent: head
tag: title attrib: None parent: head
tag: body attrib: None parent: html
tag: h1 attrib: None parent: div
tag: h1 attrib: None parent: div
tag: h2 attrib: None parent: div
/tmp/ipykernel_27254/2858024143.py:4: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
if parent := element.getparent():
Executing CREATE statement takes effect after committing session. You should commit() after execCypher(...)
cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))
b = [x[0].id for x in cursor]
ag.commit()
Try following codes :
ag = age.connect(graph=GRAPH_NAME, dsn=DSN)
tree = html.parse("demo.html")
for element in tree.getiterator():
if parent := element.getparent():
parent = None
cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))
b = [x[0].id for x in cursor] # get last inserted ID
ag.commit()
print(b[0])
ag.execCypher("MATCH (c:node), (p:node) WHERE c.id = %s AND p.id = %s CREATE (a)-[r:connects}]->(b)") # Match child node 'c', parent node: p and join C Connects P (P is unknown)