Search code examples
pythonproxytunneling

Simple local tunneling proxy that relays requests to a list of external proxies


I'm new here!

And I'm stuck, unfortunately.

I need a single proxy accessible under IP:Port for scraping activities to bypass cloudflare firewall, that can be entered in a browser. Internally, it should relay/tunnel the request, each time the page is reloaded, unchanged to another proxy randomly selected from a list. The proxies on the list do not need login handling. So it 'should' be quite simple to implement. Unfortunately I can't seem to implement it with my preferred language PHP and I'm a absolutely beginner in Python. I wrote (and built from snippets) the following code, it starts without errors, and I can connect to the local python-proxy (according to wget -v output), but unfortunately nothing happens after that. Except a timeout. I tried it on Windows (firewall disabled) and on Debian. The external proxies tested ok too.

Can someone please help me out with this? Am I missing anything obvious as a bloody python beginner? Or does anyone already know of a simple tunneling proxy script that does what I need (relaying to rotating proxies)? I also would like to add, once the connection works, a mysql query that randomly queries me a proxy from the database, so it should be customizable.

Or is that the completely wrong approach?

Thank You!

import socket
import socketserver
import select
import itertools

# proxylist
PROXY_LIST = ['xxx.xxx.xxx.220:3128', 'xxx.xxx.xxx.22:3128', 'xxx.xxx.xxx.105:3128']
proxy_pool = itertools.cycle(PROXY_LIST)

class ThreadingTCPServer(socketserver.ThreadingMixIn, socketserver.TCPServer):
    pass

class HTTPRequestHandler(socketserver.BaseRequestHandler):
    def handle(self):
        data = self.request.recv(4096).strip()
        proxy = next(proxy_pool)
        hostname, port = proxy.split(':') 
        remote = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        remote.connect(("xxx.xxx.xxx.22", 3128))  # for testing a static proxy, not from above list
        remote.send(data)

        inputs = [self.request, remote]
        while True:
            readable, writable, exceptional = select.select(inputs, [], inputs, 60)
            if not readable and not writable and not exceptional:
                break

            for s in readable:
                if s is remote:
                    out = self.request
                else:
                    out = remote
                data = s.recv(4096)
                if data:
                    out.send(data)

            for s in exceptional:
                inputs.remove(s)
                s.close()

        remote.close()
        self.request.close()

if __name__ == "__main__":
    with ThreadingTCPServer(('localhost', 3128), HTTPRequestHandler) as server:
        server.serve_forever()
wget -v google.de -e use_proxy=yes -e http_proxy=127.0.0.1:3128

It connects to the proxy. But respond timeouts. In the moment I kill the script I get the following message:

Exception occurred during processing of request from ('127.0.0.1', 56714)
Traceback (most recent call last):
  File "xxx\Python\Python311\Lib\socketserver.py", line 691, in process_request_thread
    self.finish_request(request, client_address)
  File "xxx\Python\Python311\Lib\socketserver.py", line 361, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "xxx\Python\Python311\Lib\socketserver.py", line 755, in __init__
    self.handle()
  File "xxx\Test\test.py", line 36, in handle
    data = s.recv(4096)
           ^^^^^^^^^^^^

Solution

  • I think we can simplify your code a bit by using the selectors module:

    import selectors
    import socket
    import socketserver
    import itertools
    
    # proxylist
    PROXY_LIST = ['172.23.0.2:8888', '172.23.0.3:8888', '172.23.0.4:8888']
    proxy_pool = itertools.cycle(PROXY_LIST)
    
    class ThreadingTCPServer(socketserver.ThreadingMixIn, socketserver.TCPServer):
        pass
    
    class HTTPRequestHandler(socketserver.BaseRequestHandler):
        def handle(self):
            proxy = next(proxy_pool)
            hostname, port = proxy.split(':') 
            remote = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            remote.connect((hostname, int(port)))
    
            selector = selectors.DefaultSelector()
            selector.register(self.request, selectors.EVENT_READ)
            selector.register(remote, selectors.EVENT_READ)
    
            with self.request, remote:
                while True:
                    events = selector.select()
                    for key, _ in events:
                        infd = key.fileobj
                        outfd = remote if key.fileobj is self.request else self.request
    
                        data = infd.recv(1024)
                        if not data:
                            break
    
                        outfd.send(data)
    
    if __name__ == "__main__":
        with ThreadingTCPServer(('localhost', 3128), HTTPRequestHandler) as server:
            server.serve_forever()
    

    Using this code, I can run curl -x localhost:3128 example.com and it will successfully fetch the remote url, cycling through the list of proxies for each request.