Search code examples
pythonproxyscreen-scrapingurllib2

Does httplib2 support http proxy at all? Socks proxy works but not http


Here is my code. I cannot get any http proxy to work. Socks proxy (socks4/5) works fine though. Any ideas why? urllib2 works fine with proxies though. I am confused. Thanks..

Code :

  1 import socks
  2 import httplib2
  3 import BeautifulSoup
  4 
  5 httplib2.debuglevel=4
  6 
  7 http = httplib2.Http(proxy_info = httplib2.ProxyInfo(3, '213.30.160.160', 80))
  8 
  9 main_url = 'http://cuil.com'
 10 
 11 response, content = http.request(main_url, 'GET')
 12 
 13 #html_content = BeautifulSoup(content)
 14 
 15 print response

Output :

connect: (cuil.com, 80)
Traceback (most recent call last):
  File "test.py", line 11, in <module>
    response, content = http.request(main_url, 'GET')
  File "/home/kk/bin/pythonlib/httplib2/__init__.py", line 1129, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/home/kk/bin/pythonlib/httplib2/__init__.py", line 901, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/home/kk/bin/pythonlib/httplib2/__init__.py", line 862, in _conn_request
    conn.request(method, request_uri, body, headers)
  File "/usr/lib/python2.5/httplib.py", line 866, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.5/httplib.py", line 889, in _send_request
    self.endheaders()
  File "/usr/lib/python2.5/httplib.py", line 860, in endheaders
    self._send_output()
  File "/usr/lib/python2.5/httplib.py", line 732, in _send_output
    self.send(msg)
  File "/usr/lib/python2.5/httplib.py", line 699, in send
    self.connect()
  File "/home/kk/bin/pythonlib/httplib2/__init__.py", line 740, in connect
    self.sock.connect(sa)
  File "/home/kk/bin/pythonlib/socks.py", line 383, in connect
    self.__negotiatehttp(destpair[0],destpair[1])
  File "/home/kk/bin/pythonlib/socks.py", line 349, in __negotiatehttp
    raise HTTPError((statuscode,statusline[2]))
socks.HTTPError: (403, 'Forbidden')

Solution

  • Looks like this is an open issue with httplib2: http://code.google.com/p/httplib2/issues/detail?id=38