Search code examples
python-3.xscrapyscrapy-splash

scrapy + splash : not rendering full page javascript data


I am just exploring scrapy with splash and I am trying to scrape all the product (pants) data with productid,name and price from one of the e-commerce site gap but I didn't see all the dynamic product data loaded when I see from splash web UI splash web UI (only 16 items are loading though for every request - no clue why) I tried with the following options but no luck

  • Increasing wait time upto 20 sec
  • By starting the docker with "--disable-private-mode"
  • By using lua_script for page scrolling
  • With view report full option splash:set_viewport_full()

lua_script2 = """ function main(splash)
    local num_scrolls = 10
    local scroll_delay = 2.0

    local scroll_to = splash:jsfunc("window.scrollTo")
    local get_body_height = splash:jsfunc(
        "function() {return document.body.scrollHeight;}"
    )
    assert(splash:go(splash.args.url))
    splash:wait(splash.args.wait)

    for _ = 1, num_scrolls do
        scroll_to(0, get_body_height())
        splash:wait(scroll_delay)
    end        
    return splash:html()
end"""                 
                              
            yield SplashRequest(
                url,
                self.parse_product_contents,
                endpoint='execute', 
                args={
                        'lua_source': lua_script2,
                        'wait': 5,
                    }
                )
 

Can anyone please shed some light on this behavior? p.s : I am using scrapy framework and I am able to parse the product information (itemid,name and price) from the render.html (but render.html has only 16 items information)


Solution

  • I updated the script to below

    function main(splash)
        local num_scrolls = 10
        local scroll_delay = 2.0
        splash:set_viewport_size(1980, 8020)
        local scroll_to = splash:jsfunc("window.scrollTo")
        local get_body_height = splash:jsfunc(
            "function() {return document.body.scrollHeight;}"
        )
        assert(splash:go(splash.args.url))
    --    splash:set_viewport_full()
        splash:wait(10)
        splash:runjs("jQuery('span.icon-x').click();")
        splash:wait(1)
        for _ = 1, num_scrolls do
            scroll_to(0, get_body_height())
            splash:wait(scroll_delay)
        end      
    
          splash:wait(30)
    
        return { 
            png = splash:png(),
            html = splash:html(),
            har = splash:har()
           }
    end
    

    And ran it in my local splash, the png doesn't work fine but the HTML has the last product

    Last Image on page

    Splash Rendered HTML

    The only issue was when the email subscribe popup is there it won't scroll, so I added code to close it