Search code examples
htmlxmlxpath

XPath for an element with only child elements named X?


Let's say I have this example:

<div>
<p>some text <em>emphasized text</em> some other text</p>
<p><em>The paragraph I want to capture</em></p>
<p>some text <em>emphasized text</em> some other text and <em>other em text</em> until the end.</p>
</div>

What I want to select is the second paragraph (but it may be third or first as well). The thing is that here p and em are adjacent. There is no text between <p> and <em>, not at the beginning nor in the end. All text is inside <em>xyz</em>.

How can I get it with XPath query ?

I tried //p/em, //p/child:em, //em/parent:p, all these select the three paragraphs as all em are children of p. //p[starts-with(.,'./em')] didn't help either.


Solution

  • Update

    Per the comments, OP clarifies:

    Yes, I want to capture any paragraph that contains only emphasized text, is it enclosed in one or more em tags.

    Therefore, I suggest this updated XPath,

    //p[em][not(node()[not(self::em)])]
    

    will select all p elements with one or more em child elements, but no other children of any sort — only fully emphasized paragraphs.


    Old answer

    This XPath,

    //p[count(node())=1][em]
    

    will select all p elements with a single child node that is a em element.

    Explanation

    • //p selects all p elements in the document.
    • [count(node())=1] filters to only those p elements that have a single child node(). Since node() matches nodes of any type (including both element nodes and text nodes), it will ensure that only p elements with a single child of any type are selected.
    • [em] filters to only those single-child p elements that have a em element child.

    Therefore, for your input XML/HTML, only the targeted p,

    <p><em>The paragraph I want to capture</em></p>
    

    will be selected. Had there been another p with three em children,

    <p><em>Do</em><em>not</em><em>select</em></p>
    

    or one em child and other element children,

    <p><em>Do</em><sup>not</sup><sub>select!</sub><span> or else!</span></p>
    

    such p elements would not have been selected.

    Warning: The XPath in the other answer here, //p[not(text())][em], however, would select such p elements.

    See also