Search code examples
htmlxmlxpathstring-concatenation

XPath to return string concatenation splitted by html tag


How can I return string value containing the concatenated values using an XPath expression?

<div>
This text node (1) should be returned.
<em>And the value of this element.</em>
And this.
</div>

<div>
This text node (2) should be returned.
And this.
</div>

<div>
This text node (3) should be returned.
<em>And the value of this element.</em>
And this.
</div>

The returned value should be an array of strings split by div element:

"This text node (1) should be returned. And the value of this element. And this."
"This text node (2) should be returned. And this."
"This text node (3) should be returned. And the value of this element. And this."

Is this possible in a single XPath expression?


Solution

  • XPath 1.0

    Cannot do with pure XPath 1.0. Instead, select the div elements:

    //div
    

    and then apply space normalization of the string values of each div element in the language hosting the XPath library call.

    XPath 2.0

    This XPath 2.0 expression,

    //div/normalize-space()
    

    will return the normalized string value of all div elements in the document:

    This text node (1) should be returned. And the value of this element. And this.
    This text node (2) should be returned. And this.
    This text node (3) should be returned. And the value of this element. And this.
    

    as requested.