I'm working on a project that needs to access a webpage using mechanize with a socks proxy. After digging a bit, I came up with the following code:
def create_connection(address, timeout=None, source_address=None):
sock = socks.socksocket()
sock.connect(address)
return sock
CRAWLER_SOCKS_PROXY_HOST = '0.0.0.0'
CRAWLER_SOCKS_PROXY_PORT = 1080
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, CRAWLER_SOCKS_PROXY_HOST, CRAWLER_SOCKS_PROXY_PORT)
socket.socket = socks.socksocket
socket.create_connection = create_connection
Which indeed allows me to access the webpage using the proxy socks I created with the ssh -f -N -D 1080 user@host
.
After doing that, I realized that Celery couldn't connect to my Redis broker giving Connection closed unexpectedly
errors so I killed the ssh process and confirmed that the proxy socks configuration was the culprit. The error obtained is: Cannot connect to redis://127.0.0.1:6379//: Error connecting to SOCKS5 proxy 0.0.0.0:1080: [Errno 111] Connection refused.
So, my question is: Is there a way to set a proxy socks for mechanize but without affecting the other parts of the code? I suspect that if I try to use requests module, it will also use the proxy which is not my intention. I just want the proxy for a specific call.
I solved this by putting the
CRAWLER_SOCKS_PROXY_HOST = '0.0.0.0'
CRAWLER_SOCKS_PROXY_PORT = 1080
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, CRAWLER_SOCKS_PROXY_HOST, CRAWLER_SOCKS_PROXY_PORT)
socket.socket = socks.socksocket
socket.create_connection = create_connection
lines inside the function call (where I needed to do the call using proxy socks) rather than in the global scope of the module. This way seems Celery can connect to the broker (and also reconnect after quitting and launching again).