Search code examples
pythonmachine-learningartificial-intelligencenaivebayes

Udacity: Cant download dataset "enron_mail_20150507.tar.gz" in Ud120-projects


I m not able to download "enron_mail_20150507.tar.gz" by doing "python startup.py". i got the following error and dont know how to fix.

    downloading the Enron dataset (this may take a while)
    to check on progress, you can cd up one level, then execute <ls -lthr>
    Enron dataset should be last item on the list, along with its current 
    size
    download will complete at about 423 MB
    Traceback (most recent call last):
    File "startup.py", line 36, in
    urllib.urlretrieve(url, filename="../enron_mail_20150507.tar.gz")
    File "C:\Python27\lib\urllib.py", line 98, in urlretrieve
    return opener.retrieve(url, filename, reporthook, data)
    File "C:\Python27\lib\urllib.py", line 245, in retrieve
    fp = self.open(url, data)
    File "C:\Python27\lib\urllib.py", line 213, in open
    return getattr(self, name)(url)
    File "C:\Python27\lib\urllib.py", line 350, in open_http
    h.endheaders(data)
    File "C:\Python27\lib\httplib.py", line 1049, in endheaders
    self._send_output(message_body)
    File "C:\Python27\lib\httplib.py", line 893, in _send_output
    self.send(msg)
    File "C:\Python27\lib\httplib.py", line 855, in send
    self.connect()
    File "C:\Python27\lib\httplib.py", line 832, in connect
    self.timeout, self.source_address)
    File "C:\Python27\lib\socket.py", line 557, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
    IOError: [Errno socket error] [Errno 11001] getaddrinfo failed

I tried changing url in "startup.py" to " http://www.cs.cmu.edu/~enron/enron_mail_20150507.tar.gz", but it is not working too. If anybody downloaded it using python on WINDOW, please show me how. I m really appreciated.

Anyway, I tried downloaded it manually but the file is kept downloading even after 1.1 GB of the file is downloaded. So, i got scared and stopped it... lol XD. How large is the "enron_mail_20150507.tar.gz" file? Where do i put the file after it is downloaded? In ud120-projects ?

Please help me. Im stuck.


Solution

  • The problem is sloved. i downloaded it manually through the link in starup.py and, the file size is 1.69 G (zipped) and 2.23 G (unzipped).