Search code examples
pythoncsvurllib

Getting error while downloading csv files from GitHub (urllib.request)


I have been working on this script for a couple weeks now and I just finished, everything was working perfectly fine. I returned to the script and tried to run it but I ran into an error and have know idea what it means and how to solve it. There error occurs quite early in the script when I am downloading some csv files containing CoronaVirus data. I was using urllib.request to do this and I got an error

  File "Run.py", line 13, in <module>
    urllib.request.urlretrieve(url, filename="time_series_covid19_recoveredGlobal.csv")
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1362, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1322, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>

and another one

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 1319, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1230, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1276, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1225, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1004, in _send_output
    self.send(msg)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 944, in send
    self.connect()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1392, in connect
    super().connect()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 915, in connect
    self.sock = self._create_connection(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/socket.py", line 787, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/socket.py", line 918, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

I am wondering why this is happening. Here is what I have:

import urllib.request
url = "https://raw.githubuserxxxent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv"
urllib.request.urlretrieve(url, filename="time_series_covid19_recoveredGlobal.csv")
url = "https://raw.githubuserxxxent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
urllib.request.urlretrieve(url, filename="time_series_covid19_confirmed_global.csv")
url = "https://raw.githubuserxxxent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"
urllib.request.urlretrieve(url, filename="time_series_covid19_deaths_global.csv")

Thank You!


Solution

  • Your URLs have typos:

    Instead of the domains being githubuserxxxent.com, they should be githubusercontent.com.

    Note that githubuserxxxent.com returns an Address not found error when trying to access it in the browser.

    Maybe you did a find and replace in your code and that's why this happened.

    Fixed code:

    import urllib.request
    url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv"
    urllib.request.urlretrieve(url, filename="time_series_covid19_recoveredGlobal.csv")
    url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
    urllib.request.urlretrieve(url, filename="time_series_covid19_confirmed_global.csv")
    url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"
    urllib.request.urlretrieve(url, filename="time_series_covid19_deaths_global.csv")