Search code examples
xpathscrapy

Scrapy / extracting data across multiple HTML tags


newbie to Scrapy, but catching up fast. One thing I can't figure out, though, despite Googling and Copiloting, so I appreciate your patience :) I have some HTML that looks like this:

<p>
   "The "
   <strong class="meep">cat</strong>
   " sat "
   <a href="whatever1" title="whatever2">on</a>
   " the mat."
</p>

I went to the parent div of the p, and executed:

response.xpath('//div[@class="whatever3"]/p[2]/text()').extract()

...but it outputs ['The ', 'sat', ' the mat.']

How can I add to the code to get "The cat sat on the mat."? I also tried following-sibling syntax, but just couldn't get it to work. I also tried using join but couldn't get that to work, here, either...

Appreciate thoughts.


Solution

  • To mimic all text nodes value, simply you can use //text()

    response.xpath('//div[@class="whatever3"]/p[2]//text()').extract() 
    

    join method will textract strings into a single string separated by spaces.

      ' '.join(response.xpath('//div[@class="whatever3"]/p[2]
        //text()').extract())