Search code examples
xmlxpathdescendantpredicates

XPath query with descendant and descendant text() predicates


I would like to construct an XPath query that will return a "div" or "table" element, so long as it has a descendant containing the text "abc". The one caveat is that it can not have any div or table descendants.

<div>
  <table>
    <form>
      <div>
        <span>
          <p>abcdefg</p>
        </span>
      </div>
      <table>
        <span>
          <p>123456</p>
        </span>
      </table>
    </form>
  </table>
</div>

So the only correct result of this query would be:

/div/table/form/div 

My best attempt looks something like this:

//div[contains(//text(), "abc") and not(descendant::div or descendant::table)] | //table[contains(//text(), "abc") and not(descendant::div or descendant::table)]

but does not return the correct result.

Thanks for your help.


Solution

  • Something different: :)

    //text()[contains(.,'abc')]/ancestor::*[self::div or self::table][1]
    

    Seems a lot shorter than the other solutions, doesn't it? :)

    Translated to simple English: For any text node in the document that contains the string "abc" select its first ancestor that is either a div or a table.

    This is more efficient, as only one full scan of the document tree (and not any other) is required, and the ancestor::* traversal is very cheap compared to a descendent:: (tree) scan.

    To verify that this solution "really works":

    <xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>
    
     <xsl:template match="/">
      <xsl:copy-of select=
      "//text()[contains(.,'abc')]/ancestor::*[self::div or self::table][1] "/>
     </xsl:template>
    </xsl:stylesheet>
    

    when this transformation is performed on the provided XML document:

    <div>
      <table>
        <form>
          <div>
            <span>
              <p>abcdefg</p>
            </span>
          </div>
          <table>
            <span>
              <p>123456</p>
            </span>
          </table>
        </form>
      </table>
    </div>
    

    the wanted, correct result is produced:

    <div>
       <span>
          <p>abcdefg</p>
       </span>
    </div>
    

    Note: It isn't necessary to use XSLT -- any XPath 1.0 host -- such as DOM, must obtain the same result.