Search code examples
pythonpython-3.xtorhttp.client

Python3: http.client with privoxy/TOR making bad requests


I'm trying to use TOR with http.client.HTTPConnection, but for some reason I keep getting weird responses from everything. I'm not really sure exactly how to explain, so here's an example of what I have:

class Socket(http.client.HTTPConnection):
    def __init__(self, url):
        super().__init__('127.0.0.1', 8118)
        super().set_tunnel(url)
        #super().__init__(url)

    def get(self, url = '/', params = {}):
        params = util.params_to_query(params)
        if params:
            if url.find('?') == -1: url += '?' + params
            else: url += '&' + params

        self.request(
             'GET',
             url,
             '',
             {'Connection': 'Keep alive'}
        )
        return self.getresponse().read().decode('utf-8')

If I run this with:

sock = Socket('www.google.com')
print(sock.get())

I get:

<html><head><meta content="text/html;charset=utf-8" http-equiv="content-type"/>
<title>301 Moved</title></head><body>
<h1>301 Moved</h1>
The document has moved
<a href="http://www.google.com:8118/">here</a>.
</body></html>

Google is redirecting me to the url I just requested, except with the privoxy port. And it gets weirder - if I try https://check.torproject.org:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>Welcome to sergii!</title>
</head>
<body>
<h1>Welcome to sergii!</h1>

This is sergii, a system run by and for the <a href="http://www.torproject.org/">Tor Project</a>.
She does stuff.
What kind of stuff and who our kind sponsors are you might learn on
<a href="http://db.torproject.org/machines.cgi?host=sergii">db.torproject.org</a>.

<p>
</p><hr noshade=""/>
<font size="-1">torproject-admin</font>
</body>
</html>

If I don't try to use privoxy/TOR, I get exactly what your browser gets at http://www.google.com or http://check.torproject.org. I don't know what's going on here. I suspect the issue is with python because I can use TOR with firefox, but I don't really know.

Privoxy log reads:

2015-06-27 19:28:26.950 7f58f4ff9700 Request: www.google.com:80/
2015-06-27 19:30:40.360 7f58f4ff9700 Request: check.torproject.org:80/

TOR log has nothing useful to say.


Solution

  • This ended up being because I was connecting with http:// and those sites wanted https://. It does work correctly for sites that accept normal http://.