Search code examples
javahtmlxpathjsoup

HTML extract bare text node following H2 element in Body


Attempting to extract the value of a text node within html body element. It immediately follows a known h2 tag which I can find using h2[text() = 'A Heading']. But I cannot figure out how to get the following text node, that is the text "I would like to know how to specify an XPath expression for this text." in the following example.

Am using Java and JSoup, but any tool, preferably Java based

Any assistance appreciated.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Finding Text following H2 tag</title>
  </head>
  <body>
    Some text.
    <h2>A Heading</h2>
    I would like to know how to specify an 
    XPath expression for this text.
    <h2>Another Heading</h2>
    Some more text.
  </body>
</html>

Solution

  • You can try this.

    //h2/following-sibling::text()
    

    Output:

    node:

    Some text.

    A Heading

    I would like to know how to specify an XPath expression for this text.

    Another Heading

    Some more text. Some text.

    A Heading

    I would like to know how to specify an XPath expression for this text.

    Another Heading

    Some

    text value:

    I would like to know how to specify an XPath expression for this text.
    Some more text.