Search code examples
pythonweb-scrapingscrapyscrapy-shell

Scrapy shell- correct xpath selector for getting info from a table?


I'm trying to obtain the correct Xpath for extracting the information circled in red the image below:

enter image description here

I've tried copying the xpath and pasting it to the scrapy shell but it isn't working. I'm having difficulties because the information is contained inside a table and every element of the table has the same name. The website is

https://virtualmuebles.com/muebles-sala/mesa-tv-invy-1c-casa-linda-wg


Solution

  • Assuming the text Marca is constant on all the pages you want to scrape. First search for a b element containg the text 'Marca'. Find its parent if it is a td element. Get the following sibling if it is a td element. Get its text node:

    response.xpath("//b[contains(text(),'Marca')]/parent::td/following-sibling::td/text()").get()
    

    otherwise if it is always the second td element of the fourth tr element:

    response.xpath("//tr[4]/td[2]/text()").get()
    

    outputs:

    'RTA Design'