Search code examples
pythonpython-3.xurllibioerror

How do I fix this IOError: [Errno socket error] [Errno 11004]?


This simple Python 3 script:

import urllib.request

host = "scholar.google.com"
link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0"
url = "http://" + host + link
filename = "cite0.bib"
print(url)
urllib.request.urlretrieve("http://scholar.google.com" + url, filename)

raises this exception:

Traceback (most recent call last):
  File "C:/Users/ricardo/Desktop/Google-Scholar/BibTex/test2.py", line 8, in <module>
    urllib.request.urlretrieve("http://scholar.google.com" + url, filename)
  File "C:\Python32\lib\urllib\request.py", line 150, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "C:\Python32\lib\urllib\request.py", line 1569, in retrieve
    fp = self.open(url, data)
  File "C:\Python32\lib\urllib\request.py", line 1541, in open
    raise IOError('socket error', msg).with_traceback(sys.exc_info()[2])
  File "C:\Python32\lib\urllib\request.py", line 1537, in open
    return getattr(self, name)(url)
  File "C:\Python32\lib\urllib\request.py", line 1715, in open_http
    return self._open_generic_http(http.client.HTTPConnection, url, data)
  File "C:\Python32\lib\urllib\request.py", line 1695, in _open_generic_http
    http_conn.request("GET", selector, headers=headers)
  File "C:\Python32\lib\http\client.py", line 967, in request
    self._send_request(method, url, body, headers)
  File "C:\Python32\lib\http\client.py", line 1005, in _send_request
    self.endheaders(body)
  File "C:\Python32\lib\http\client.py", line 963, in endheaders
    self._send_output(message_body)
  File "C:\Python32\lib\http\client.py", line 808, in _send_output
    self.send(msg)
  File "C:\Python32\lib\http\client.py", line 746, in send
    self.connect()
  File "C:\Python32\lib\http\client.py", line 724, in connect
    self.timeout, self.source_address)
  File "C:\Python32\lib\socket.py", line 386, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno 11004] getaddrinfo failed

I can open the url that results from the print statement just fine:

http://scholar.google.com/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0

What is causing this? I tried changing http:// to http:/// (three slashes), but the same exception is raised.


Solution

  • Here's your problem:

    urllib.request.urlretrieve("http://scholar.google.com" + url, filename)
    

    You're adding the http://scholar.google.com part twice (url already starts http://scholar.google.com). Therefore urillib thinks you're asking for a page on scholar.google.comhttp -- needless to say, this domain does not exist. Which is exactly what your error says.

    Just request url obviously.

    Handy hint to find this kind of thing faster in the future: when adding a print statement for debugging, be sure to print the actual value you are using in the command you're debugging. You would have found this in approximately two seconds if your print statement had also concatenated the base URL.