Search code examples
scrapysplash-screenscrapy-splash

scrapy plash set input value?


I've succesfully been able to load javascript generated html with scrapy-splash. Now I want to set a couple input value's which are not part of a form. As soon as I put in a value the content on the site changes. I haven't found a way to set the input value's and rescrap the adjusted html. Is this possible?

class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["example.com"]
start_urls = (
    'https://example.com',
)

def start_requests(self):
    for url in self.start_urls:
        yield scrapy.Request(url, self.parse, meta={
            'splash': {
                'endpoint': 'render.html',
                'args': {'wait': 3}
            }
        })

def parse(self, response):
    page = response.url.split("/")[-2]
    filename = 'screener-%s.html' % page
    with open(filename, 'wb') as f:
        f.write(response.body)
    self.log('Saved file %s' % filename)

Solution

  • You need to put the input inside a lua_script as someone suggested in the comments, following an example to click a button:

    script ="""
            function main(splash)
               local url = splash.args.url
               assert(splash:go(url))
    
    
               assert(splash:runjs('document.getElementsByClassName("nameofbutton").click()'))
               assert(splash:wait(0.75))
    
    
               -- return result as a JSON object
               return {
                   html = splash:html()
               }
            end
            """
    

    then execute the script like this:

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(url, self.parse_item, meta={
                'splash': {
                    'args': {'lua_source': self.script},
                    'endpoint': 'execute',
                }
            })