Search code examples
pythonurllib

What does urllib.request.urlretrieve do if not returned


From the python document, it is mentioned that urllib.request.urlretrieve returns a tuple and will be used to open file as shown in Code-A below.

However in the example Code-B. The urllib.request.urlretrieve does not return but the code will fail without it. Please help clarify what does urllib.request.urlretrieve doing in Code B. THanks

Code A

import urllib.request
>>> local_filename, headers = urllib.request.urlretrieve('http://python.org/')
>>> html = open(local_filename)
>>> html.close()

Code B

import os
import tarfile
from six.moves import urllib

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml2/master/"
HOUSING_PATH = os.path.join("datasets", "housing") # datasets\housing
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
    if not os.path.isdir(housing_path):
            os.makedirs(housing_path)
    tgz_path = os.path.join(housing_path, "housing.tgz") #datasets\housing\housing.tgz
    urllib.request.urlretrieve(housing_url, tgz_path) #what does this code here do?
    housing_tgz = tarfile.open(tgz_path)
    housing_tgz.extractall(path=housing_path)
    housing_tgz.close()

Solution

  • In the second code, by specifying filename, this will automatically save the content locally at the defined path. In this case, this is tgz_path.

    I'm not sure what you mean by it failing. A tuple is always returned. The question is whether or not that is stored in memory. For example, the following will still work:

    In [1]: import urllib.request                                                                                                                       
    
    In [2]: urllib.request.urlretrieve('http://python.org/', 'test.python')                                                                             
    Out[2]: ('test.python', <http.client.HTTPMessage at 0x108d22390>)