Search code examples
neo4jcypherneo4j-apoc

Comparing List to String in neo4j, reading from HTML


Here's my problem: Let's suppose that I have an HTML file containing a table like the one below

<table>
    <tr>
        <td> keyword1 </td>
        <td>
            <p> paragraph 1 </p>
        </td>
    </tr>
    <tr>
        <td> keyword2 </td>
        <td>
            <p> paragraph 2 </p>
            <p> paragraph 3 </p>
        </td>
    </tr>
    <tr>
        <td> keyword3 </td>
        <td>
            <p> paragraph 1 </p>
            <p> paragraph 3 </p>
        </td>
    </tr>
</table>

I use the following code to extract the infos from the HTML

CALL apoc.load.html("file:///input_HTML.html",{kwords:"table tr td:eq(1)",
paragraphs:"table tr td:eq(2)",paragraphsList:"table tr td:eq(2) p"}) YIELD value

What I would like to have at the end, would be, for each input line of the table something similar to the statement below, but of course created dynamically upon reading the HTML file

CREATE(:kwords {name:"keyword1"})-[:'APPEARS_IN']->(:paragraph {name:"paragraph1"})

The tricky part is to get the paragraphs name ... any hint?


Solution

  • You need to be going after td element with an index of 1; the element index starts at 0.

    ...
    paragraphs:"table tr td:eq(1)",paragraphsList:"table tr td:eq(1)
    ...
    

    But I am not sure that still enables you to do what you want.

    How about getting the keywords in one pass and then selecting the paragraphs for each keyword in a second pass.

    CALL apoc.load.html("file:///input_HTML.html",{kwords: "tr td:eq(0)"}) YIELD value
    UNWIND value.kwords AS kw
    WITH kw.text AS kw
    CALL apoc.load.html("file:///input_HTML.html",{paras: 'tr:contains(' + kw + ') td:eq(1) p'}) YIELD value
    UNWIND value.paras AS para
    MERGE (k:kwords {name: kw }) 
    MERGE (p:paragraph {name: para.text}) 
    MERGE (k)-[:APPEARS_IN]->(p)
    RETURN *