Search code examples
htmlxmlxpath

Capture element based on previous text using XPath


I'm trying to get a <p> element by its previous text. Example:

<div>
    Header:
    <p>ITEM</p>
    ID:
    <p>123</p>
    Title:
    <p>Test</p>
</div>

where I want to capture "123". I've tried a couple of combinations of preceding-sibling but haven't been able to get it.

.//p[preceding-sibling::node()[1][self::text()][.='ID:']]

.//p[preceding-sibling::text()='ID:']

I don't have control over the HTML and they don't want to change it. I will always know the text before the paragraph I want to capture. Is this possible?

Edit: added more to the example. The element to grab won't always be the first/last item to find.


Solution

  • This XPath,

    //p[preceding-sibling::node()[1][normalize-space()='ID:']
    

    will select all p elements whose immediately preceding sibling has a space-normalized string value of ID:.

    Notes:

    • Your first try was close but failed to account for the whitespace surrounding ID:.
    • Your second try additionally failed to account for the immediacy constraint.