Search code examples
dockerweb-crawlertorscraperscrapy-splash

Running tor and splash on same Docker container


I am working on a scraper to scrap coupon codes from various websites. I am using python scrapy for this and I had to use the splash browser for this as coupon codes are showing in popups.

Now I want to use tor to send the requests by proxy. But I am not able to run tor and splash browser on same docker container.

I am running splash on docker as:

sudo docker run -p 8050:8050 scrapinghub/splash;

Some peoples are saying to use tor and splash on separate docker container and connect them. However, I am not able to find a way for that.

I tried running tor on another docker container as:

sudo docker run -it -p 8118:8118 -p 9050:9050 -d dperson/torproxy

I am sending the request like:

def start_requests(self):
    url = 'http://www.example.com/some-url'
    yield SplashRequest(
        url,
        self.parse,
        endpoint='execute',
        args={'lua_source': LUA_SCRIPT,
                'wait': 2})

and my LUA_SCRIPT is

LUA_SCRIPT = """ function main(splash)
    splash:on_request(function(request)
        request:set_proxy{
            host = "localhost",
            port = 9050,
        }
    end)
    splash.images_enabled = false
    assert(splash:go{splash.args.url})
    splash:wait(splash.args.wait)           
    return splash:html()
end"""

Can anyone suggest me how should I use splash with tor? (without tor, everything is working fine.)


Solution

  • I found the solution myself by following the link https://www.sachsenhofer.io/install-splash-use-tor-privoxy-docker-cloud-stack/

    Anyone looking for a similar thing can follow the above link.