Search code examples
xmlxpathxpath-1.0

How to extract nodes with no immediate text after them with XPath 1.0?


How to select certain nodes which have at least one following-sibling node, but have no immediate text node after them using single XPath 1.0 expression?

For instance, from the following XML:

<p>This is some <b>forma</b><b>tted</b> text, this is <b>bold</b>.</p>

I want to extract the first <b> tag.

I have come up with the following expression so far:

//b[following-sibling::*[1][self::b]][not(text() = following-sibling::text()[1]/preceding-sibling::*[1][self::b]/text())]

However, it will not extract tags with identical text, for example:

<p>I am hungry for <b>paw</b><b>paw</b>.</p>

May there be a better and simpler way?


Solution

  • This XPath,

    //*[following-sibling::node()[1][not(self::text())]]
    

    will select all elements that have an immediately following sibling that is not a text node.