Search code examples
pythonweb-scrapingscrapyscrapy-splash

How to retrieve from splash if a list was returned?


Following the example provided in the splash source here: https://github.com/scrapinghub/splash/blob/master/splash/examples/render-multiple.lua

In that lua script, a lua table was returned instead of a json object.

How can I return and retrieve an array/list instead of a table/dictionary with lua scripts when using scrapy-splash?


Solution

  • If you're using scrapy-splash then decoded result is availble as response.data (see https://github.com/scrapy-plugins/scrapy-splash#responses). You should do something like this to access PNG data for google.com:

    import base64
    # ...
         def parse_result(self, response):
             img = base64.b64decode(response.data["www.google.com"])
             # ...
    

    The linked script returns a {"<url>": "<base64 png data>"} mapping, not an array.

    If you want to return an array, modify the script to use integer keys and treat.as_array:

    treat = require('treat')
    function main(splash, args)
      splash.set_viewport_size(800, 600)
      splash.set_user_agent('Splash bot')
      local example_urls = {"www.google.com", "www.bbc.co.uk", "scrapinghub.com"}
      local urls = args.urls or example_urls
      local results = {}
      for i, url in ipairs(urls) do
        local ok, reason = splash:go("http://" .. url)
        if ok then
          splash:wait(0.2)
          results[i] = splash:png()
        end
      end
      return treat.as_array(results)
    end
    

    then you can access data like this:

    import base64
    # ...
         def parse_result(self, response):
             img = base64.b64decode(response.data[0])
             # ...