Search code examples
xmlxpathxmllint

How to use text() in XPath 1.0 with a suffix?


This is what I'm doing to retrieve all texts from XHTML:

//p/text()

Works fine, but they all go one by one, without any separator. I want to add a space between them. I'm trying:

//p/concat(text(), " ")

No luck:

XPath error : Invalid expression
//p/concat(text(), " ")
           ^

I'm using xmllint version 20902


Solution

  • XPath is really for selection, not manipulation. By going beyond selecting text to arranging it to have spacing between the selected items, you're crossing the line from mere selection to manipulation. For manipulation, consider XSLT instead of just XPath.

    That said, if you could XPath 2.0 (via another tool -- xmllint supports only XPath 1.0), you can join selected strings together via string-join():

     string-join(//p/text(), ' ')
    

    Note, however, that //p/text() will miss text in div, span etc. Perhaps you meant //*/text() or //text() ? Note further that even in XPath 1.0, you could get all of the text via string(/), although that won't satisfy your desire to add spaces either.