Search code examples
javascriptpythonscrapyscrapy-splash

Scrapy splash not returning results


I'm learning scrapy (with splash) and building a spider to scrape results from js enabled pages. My spider works and does return results for js pages. However, it does not return price from this link https://www.zara.com/us/en/bejewelled-appliqu%C3%A9-dress-p07854034.html?v1=4818592&v2=733885

xpath used: //*[contains(concat( " ", @class, " " ), concat( " ", "_product-price", " " ))]//span/text()

The above xpath does return results in the browser but does not return results when invoked via scrapy. Here's my spider call

yield scrapy.Request(url, callback=self.parse_page, dont_filter=True, meta={'splash': {'args': {'wait': 5,},'endpoint': 'render.html',}})

Can you please help figure out why price from the site is not returned?

Thanks!


Solution

  • The problem is that price is not present at all in Splash rendered HTML output (best to see is to put your URL in Splash console in web browser on 8050 port and see it's rendered output). Start with Splash FAQ for when page is not rendered correctly. You will find out that in your case the solution is to disable Private mode for Splash, either via --disable-private-mode startup option for Docker, or by setting splash.private_mode_enabled = false in your LUA script. After disabling private mode, page renders correctly.