I am working on a scraper to scrap coupon codes from various websites. I am using python scrapy for this and I had to use the splash browser for this as coupon codes are showing in popups.
Now I want to use tor to send the requests by proxy. But I am not able to run tor and splash browser on same docker container.
I am running splash on docker as:
sudo docker run -p 8050:8050 scrapinghub/splash;
Some peoples are saying to use tor and splash on separate docker container and connect them. However, I am not able to find a way for that.
I tried running tor on another docker container as:
sudo docker run -it -p 8118:8118 -p 9050:9050 -d dperson/torproxy
I am sending the request like:
def start_requests(self):
url = 'http://www.example.com/some-url'
yield SplashRequest(
url,
self.parse,
endpoint='execute',
args={'lua_source': LUA_SCRIPT,
'wait': 2})
and my LUA_SCRIPT is
LUA_SCRIPT = """ function main(splash)
splash:on_request(function(request)
request:set_proxy{
host = "localhost",
port = 9050,
}
end)
splash.images_enabled = false
assert(splash:go{splash.args.url})
splash:wait(splash.args.wait)
return splash:html()
end"""
Can anyone suggest me how should I use splash with tor? (without tor, everything is working fine.)
I found the solution myself by following the link https://www.sachsenhofer.io/install-splash-use-tor-privoxy-docker-cloud-stack/
Anyone looking for a similar thing can follow the above link.