Search code examples
javaxmldom4jxpath

dom4j XPath not working parsing xhtml document


I'm trying to use dom4j to parse an xhtml document. If I simply print out the document I can see the entire document so I know it is being loaded correctly. The two divs that I'm trying to select are at the exact same level in the document.

html
  body
    div
     table
      tbody
       tr
        td
         table
           tbody
            tr
             td
              div class="definition"
              div class="example"

My code is

List<Element> list = document.selectNodes("//html/body/div/table/tbody/tr/td/table/tbody/tr/td");

but the list is empty when i do System.out.println(list);

If i only do List<Element> list = document.selectNodes("//html"); it does actually return a list with one element in it. So I'm confused about whats wrong with my xpath and why it won't find those divs


Solution

  • Try declaring the xhtml namespace to the xpath, e.g. bind it to the prefix x and use //x:html/x:body... as XPath expression (see also this article which is however for Groovy, not for plain Java). Probably something like the following should do it in Java:

    DefaultXPath xpath = new DefaultXPath("//x:html/x:body/...");
    Map<String,String> namespaces = new TreeMap<String,String>();
    namespaces.put("x","http://www.w3.org/1999/xhtml");
    xpath.setNamespaceURIs(namespaces);
    
    list = xpath.selectNodes(document);
    

    (untested)