Search code examples
pythonweb-scrapingscrapyscrapy-splash

Simplest/Beginner-friendly method to make Scrapy render Javascript content


Considering this website here: https://dlnr.hawaii.gov/dsp/parks/hawaii/akaka-falls-state-park/

I'm looking to scrape the content under the headings on the right. Here is my sample code that I tried out with Requests and BS that renders an empty list because it can't render the Javascript. Scrapy with default settings also won't find it. The code below returns empty strings since its not rendering the javascript.

import requests as req
from bs4 import BeautifulSoup as bs

r = req.get('https://dlnr.hawaii.gov/dsp/parks/hawaii/akaka-falls-state-park/').text
soup = bs(r)

par = soup.find('h3', text= 'Facilities')

for sib in par.next_siblings:
    print(sib.text)

I would like to know the easiest way for Scrapy to render Javascript. Looking at the responses in dev tools seems like too much work especially if you're using automated scraping to capture multiple elements. Scrapy-splash seems a bit complicated and scrapy-selenium isn't active anymore, but I'm open to both of these options.

Would appreciate any help. Thanks.


Solution

  • Scrapy has no solution for this out of the box. The easiest way is to use scrapy-splash plugin.

    The missing data in initial html usually means, that it's loaded in a different request. Careful look at the requests in chrome developer tools quickly gives that request. If you invest time into understanding how this UI works (where to get numbers 57871 and 1621203973679 for the 2nd request), you'll not even need to render anything.