Search code examples
xpathweb-scrapingscrapyhtml-parsing

Xpath extract text after using attribute selectors


I want to extract some text from a HTML file with just Xpath code. I get the the text extracted in Chrome console with:

1) TEXT=$x('//*[@id="olpOfferListColumn"]')

2) TEXT[0].innerText

But now, I want combine these two commands in one with just XPath. I tried of course staff like :

TEXT=$x('//*[@id="olpOfferListColumn"]/text()') 

or

TEXT=$x('//*[@id="olpOfferListColumn"]/::text()') 

Solution

    1. //*[@id="olpOfferListColumn"]/text() means to return you child text nodes. But #olpOfferListColumn element has no child text nodes, but descendant text nodes (to get all descendant text nodes you might need to use //*[@id="olpOfferListColumn"]//text())

    2. //*[@id="olpOfferListColumn"]/::text() - invalid XPath

    Try

    string(//*[@id="olpOfferListColumn"])
    

    to get all text content (analogue of innerText property) of #olpOfferListColumn