Search code examples
xpathxml-parsinghtml-parsing

XPath node that doesn't contain a child


I'm trying to access a certain element from by using XML but I just can't seem to get it, and I don't understand quite why.

<ul class="test1" id="content">
                <li class="list">
                    <p>Insert random text here</p>
                        <div class="author">
                            </div>
                </li>
                <li class="list">
                    <p>I need this text here</p>
                </li>
    </ul>

Basically the text I want is the second one but I want/need to use something similar to p[not(div)] as to retrieve it.

I have tried the methods from the following link but to no avail (xpath find node that does not contain child)

Here is how I tried accessing the text:

ul[contains(@id,"content")]//p[not(.//div)]/text()

If you have any possible answers, thank you !


Solution

  • The HTML snippet posted in question shows that both p elements do not contain any div, so the expression //p[not(.//div)] would match both p. The first p element is sibling of the div (both shares the same parent element li) instead of parent or ancestor. The following XPath expression would match text nodes from the 2nd p and not those from the first one:

    //ul[contains(@id,"content")]/li[not(div)]/p/text()
    

    Brief explanation:

    • //ul[contains(@id,"content")]: find ul elements where id attribute value contains text "content"
    • /li[not(div)]: from such ul find child elements li that don't have child element div. This will match only the end li in the example HTML
    • /p/text(): from such li, find child elements p and then return child text nodes form such p