Search code examples
scrapyscrapy-shell

scrapy downloads the html page but could get data using xpaths or css


I am trying scrape this page, when I do scrapy shell "https://redsea.com/en/apple-iphone-x-64gb-silver.html", it downloads the html page and I can view the downloaded html with view(response) in the browser: enter image description here enter image description here

But when I try to get any data -product name, for example- by response.css('.page-title') it gives me empty response: enter image description here

Scraping a website that fetches data using rest-api using scrapy just downloads the website structure html without data and it makes sense that scrapy cannot get that data. But in this case scrapy downloads the html file with data but not able to read it using css or xpaths. I don't understand this behavior.


Solution

  • But in this case scrapy downloads the html file with data but not able to read it using css or xpaths.

    It doesn’t, when you open the HTML in a browser, the JavaScript loads the content into the DOM, either from a separate URL or from hard-coded values in JavaScript, which is why you can see the content using view(response).

    If you inspect the actual HTML content (e.g. open the page sources in your browser, Ctrl+U in Firefox), you’ll see that the data you want is either not there at all or inside an <script/> element.

    Open the Network tab of the developer tools of your web browser, force-reload the page (Ctrl+Shift+R in Firefox) and watch the additional requests that are performed on the background, one of them is likely to have the desired data.

    You can then have Scrapy perform a requests similar to that request made in the background.