Search code examples
javahtmlunit

HTMLunit iterate nodes


I try iterate html nodes and getting information from this nodes.

This is html example:

<div class="less1">
  <h4>Test name 1</h4>
  <div>
     <div id="email">[email protected]</div>
     <div id="email">[email protected]</div>
     <div id="email">[email protected]</div>
  </div>
</div>
<div class="less1">
  <h4>Test name 2</h4>
  <div>
     <div id="email">[email protected]</div>
     <div id="email">[email protected]</div>
     <div id="email">[email protected]</div>
  </div>
</div>
<div class="less1">
  <h4>Test name 3</h4>
  <div>
     <div id="email">[email protected]</div>
  </div>
</div>
<div class="less1">
  <h4>Test name 4</h4>
</div>

This is my code example.

final List<HtmlListItem> nodes = htmlPage.getByXPath("//*[@class=\"less1\"]");

for (HtmlListItem node: nodes) {
   final List<?> divs = node.getByXPath("//h4/text()");
}

"divs" List size is always 4.

Is it possible get only 1 result from current node?


Solution

  • To get only the first matching element use getFirstByXPath:

    final List<?> divs = node.getFirstByXPath("//h4/text()");
    

    If you need a specific element by index:

    final Object div = node.getByXPath("//h4/text()").get(index);
    

    UPDATE

    Maybe the problem is the usage of an absolute xpath. Try to use a relative path on every node:

    String text = node.getByXPath("h4/text()");
    List<String> emails = node.getByXPath("div/div");
    

    Otherwise you can extract data from every node exploring the child nodes

    for (HtmlListItem node: nodes) {
        NodeList children = node.getChildNodes();
        for (int i = 0; i < children.getLength(); i++) {
           Node child = children.item(i);
           /** extract data from child **/
        }       
    }