From the python document, it is mentioned that urllib.request.urlretrieve
returns a tuple and will be used to open file as shown in Code-A below.
However in the example Code-B. The urllib.request.urlretrieve
does not return but the code will fail without it. Please help clarify what does urllib.request.urlretrieve
doing in Code B. THanks
Code A
import urllib.request
>>> local_filename, headers = urllib.request.urlretrieve('http://python.org/')
>>> html = open(local_filename)
>>> html.close()
Code B
import os
import tarfile
from six.moves import urllib
DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml2/master/"
HOUSING_PATH = os.path.join("datasets", "housing") # datasets\housing
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"
def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
if not os.path.isdir(housing_path):
os.makedirs(housing_path)
tgz_path = os.path.join(housing_path, "housing.tgz") #datasets\housing\housing.tgz
urllib.request.urlretrieve(housing_url, tgz_path) #what does this code here do?
housing_tgz = tarfile.open(tgz_path)
housing_tgz.extractall(path=housing_path)
housing_tgz.close()
In the second code, by specifying filename
, this will automatically save the content locally at the defined path. In this case, this is tgz_path
.
I'm not sure what you mean by it failing. A tuple is always returned. The question is whether or not that is stored in memory. For example, the following will still work:
In [1]: import urllib.request
In [2]: urllib.request.urlretrieve('http://python.org/', 'test.python')
Out[2]: ('test.python', <http.client.HTTPMessage at 0x108d22390>)