Here's my problem: Let's suppose that I have an HTML file containing a table like the one below
<table>
<tr>
<td> keyword1 </td>
<td>
<p> paragraph 1 </p>
</td>
</tr>
<tr>
<td> keyword2 </td>
<td>
<p> paragraph 2 </p>
<p> paragraph 3 </p>
</td>
</tr>
<tr>
<td> keyword3 </td>
<td>
<p> paragraph 1 </p>
<p> paragraph 3 </p>
</td>
</tr>
</table>
I use the following code to extract the infos from the HTML
CALL apoc.load.html("file:///input_HTML.html",{kwords:"table tr td:eq(1)",
paragraphs:"table tr td:eq(2)",paragraphsList:"table tr td:eq(2) p"}) YIELD value
What I would like to have at the end, would be, for each input line of the table something similar to the statement below, but of course created dynamically upon reading the HTML file
CREATE(:kwords {name:"keyword1"})-[:'APPEARS_IN']->(:paragraph {name:"paragraph1"})
The tricky part is to get the paragraphs name ... any hint?
You need to be going after td
element with an index of 1
; the element index starts at 0
.
...
paragraphs:"table tr td:eq(1)",paragraphsList:"table tr td:eq(1)
...
But I am not sure that still enables you to do what you want.
How about getting the keywords in one pass and then selecting the paragraphs for each keyword in a second pass.
CALL apoc.load.html("file:///input_HTML.html",{kwords: "tr td:eq(0)"}) YIELD value
UNWIND value.kwords AS kw
WITH kw.text AS kw
CALL apoc.load.html("file:///input_HTML.html",{paras: 'tr:contains(' + kw + ') td:eq(1) p'}) YIELD value
UNWIND value.paras AS para
MERGE (k:kwords {name: kw })
MERGE (p:paragraph {name: para.text})
MERGE (k)-[:APPEARS_IN]->(p)
RETURN *