Let's say I have this example:
<div>
<p>some text <em>emphasized text</em> some other text</p>
<p><em>The paragraph I want to capture</em></p>
<p>some text <em>emphasized text</em> some other text and <em>other em text</em> until the end.</p>
</div>
What I want to select is the second paragraph (but it may be third or first as well). The thing is that here p
and em
are adjacent. There is no text between <p>
and <em>
, not at the beginning nor in the end. All text is inside <em>xyz</em>
.
How can I get it with XPath query ?
I tried //p/em
, //p/child:em
, //em/parent:p
, all these select the three paragraphs as all em
are children of p
.
//p[starts-with(.,'./em')]
didn't help either.
Per the comments, OP clarifies:
Yes, I want to capture any paragraph that contains only emphasized text, is it enclosed in one or more
em
tags.
Therefore, I suggest this updated XPath,
//p[em][not(node()[not(self::em)])]
will select all p
elements with one or more em
child elements, but no other children of any sort — only fully emphasized paragraphs.
This XPath,
//p[count(node())=1][em]
will select all p
elements with a single child node that is a em
element.
//p
selects all p
elements in the document.[count(node())=1]
filters to only those p
elements that have a single child node()
. Since node()
matches nodes of any type (including both element nodes and text nodes), it will ensure that only p
elements with a single child of any type are selected.[em]
filters to only those single-child p
elements that have a em
element child.Therefore, for your input XML/HTML, only the targeted p
,
<p><em>The paragraph I want to capture</em></p>
will be selected. Had there been another p
with three em
children,
<p><em>Do</em><em>not</em><em>select</em></p>
or one em
child and other element children,
<p><em>Do</em><sup>not</sup><sub>select!</sub><span> or else!</span></p>
such p
elements would not have been selected.
Warning: The XPath in the other answer here, //p[not(text())][em]
, however, would select such p
elements.