Search code examples
raspberry-piscrapysplash-screen

Scrapy and Javascript sites on Rasbperry Pi


I am trying to scrape a page that is modified by javascript after initial load using Scrapy on Raspberri Pi.

I tried to install docker and scrapinghub/splash to render the page before passing it into scrapy, but realized Splash doesn't support ARM yet. Are there other options to scrape pages using javascript with Scrapy on a Raspberry Pi?

Currently, using the normal scrapy request on the site I only get this html, which is because the site loads first, and then the javascript renders the entire content. So before the javascript the page source looks empty:

<body class="notie8 notie9 lang-{{html.lang}}">
<!--<![endif]-->
    <div loading-line></div>

    <div page-layout>
        <div ng-view></div>
    </div>
</body>
</html>

For reference, the site I am referring to is: https://www.sreality.cz/hledani/prodej/byty?region=brno


Solution

  • Sreality uses API, isn't this a way to go? For your URL, there's this API call: https://www.sreality.cz/api/cs/v2/estates?category_main_cb=1&category_type_cb=1&per_page=20&region=brno&tms=1502631428897 (look for XHR requests in your browser's developer tools).