I'm trying to use TOR with http.client.HTTPConnection
, but for some reason I keep getting weird responses from everything. I'm not really sure exactly how to explain, so here's an example of what I have:
class Socket(http.client.HTTPConnection):
def __init__(self, url):
super().__init__('127.0.0.1', 8118)
super().set_tunnel(url)
#super().__init__(url)
def get(self, url = '/', params = {}):
params = util.params_to_query(params)
if params:
if url.find('?') == -1: url += '?' + params
else: url += '&' + params
self.request(
'GET',
url,
'',
{'Connection': 'Keep alive'}
)
return self.getresponse().read().decode('utf-8')
If I run this with:
sock = Socket('www.google.com')
print(sock.get())
I get:
<html><head><meta content="text/html;charset=utf-8" http-equiv="content-type"/>
<title>301 Moved</title></head><body>
<h1>301 Moved</h1>
The document has moved
<a href="http://www.google.com:8118/">here</a>.
</body></html>
Google is redirecting me to the url I just requested, except with the privoxy port. And it gets weirder - if I try https://check.torproject.org:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>Welcome to sergii!</title>
</head>
<body>
<h1>Welcome to sergii!</h1>
This is sergii, a system run by and for the <a href="http://www.torproject.org/">Tor Project</a>.
She does stuff.
What kind of stuff and who our kind sponsors are you might learn on
<a href="http://db.torproject.org/machines.cgi?host=sergii">db.torproject.org</a>.
<p>
</p><hr noshade=""/>
<font size="-1">torproject-admin</font>
</body>
</html>
If I don't try to use privoxy/TOR, I get exactly what your browser gets at http://www.google.com or http://check.torproject.org. I don't know what's going on here. I suspect the issue is with python because I can use TOR with firefox, but I don't really know.
Privoxy log reads:
2015-06-27 19:28:26.950 7f58f4ff9700 Request: www.google.com:80/
2015-06-27 19:30:40.360 7f58f4ff9700 Request: check.torproject.org:80/
TOR log has nothing useful to say.
This ended up being because I was connecting with http://
and those sites wanted https://
. It does work correctly for sites that accept normal http://
.