Search code examples
htmlrubyxpathxhtmlrexml

Using XPath: find last text node of each paragraph under the root node


I want to trim trailing whitespace at the end of all XHTML paragraphs. I am using Ruby with the REXML library.

Say I have the following in a valid XHTML file:

<p>hello <span>world</span> a </p>
<p>Hi there </p>
<p>The End </p>

I want to end up with this:

<p>hello <span>world</span> a</p>
<p>Hi there</p>
<p>The End</p>

So I was thinking I could use XPath to get just the text nodes that I want, then trim the text, which would allow me to end up with what I want (previous).

I started with the following XPath:

//root/p/child::text()

Of course, the problem here is that it returns all text nodes that are children of all p-tags. Which is this:

'hello '
' a '
'Hi there '
'The End '

Trying the following XPath gives me the last text node of the last paragraph, not the last text node of each paragraph that is a child of the root node.

//root/p/child::text()[last()]

This only returns: 'The End '

What I would like to get from the XPath is therefore:

' a '
'Hi there '
'The End '

Can I do this with XPath? Or should I maybe be looking at using regular expressions (That's probably more of a headache than XPath)?


Solution

  • Your example worked for me

    //p/child::text()[last()]