Search code examples
htmlparsingweb-scrapingxpath

xpath to select text preceded by specific element


I've got the following html:

<body>
    <h1 id = 'example'>text</h1>
    "My car is a "
    <abbr>
        <a href = 'exampleRef'>
            Ferrari
        </a>
    </abbr>
    "that goes 100 km/h"
</body>

I'm trying to extract the text 'My car is a Ferrari that goes 100 km/h". The text is not contained in any specific element so I thought of using the following-sibling syntax to extract at least 'My car is'. I tried with the following expression:

//h1[@id ='example']/following-sibling::text()

and also

//h1[@id ='example']/following-sibling

but got no matches.


Solution

  • If you're able to use XPath 2.0+, you could use string-join() on the following sibling nodes...

    normalize-space(string-join(//h1[@id='example']/following-sibling::node()))