Search code examples
pythonmacostarfile

Opening .tgz files with tarfile on Mac os X


I'm studying the book Hands-On Machine Learning with Scikit-Learn and TensorFlow, and in the first project, as you probably know, it's going to deal with a housing dataset.

When I want to open the housing.tgz with tarfile, I get the error :ReadError: file could not be opened successfully.

I'm a Mac OS user and I searched a lot. But I couldn't find anyway to handle this problem. Even I tried the solution in [this question][1], but it didn't work. It seems that Mac cannot open .tgz files. Can anybody help?( I copied the codes from the book and there's no problem with the codes)

Thanks in advance!

My code:

import os
import tarfile
from six.moves import urllib

DOWNLOAD_ROOT = "https://github.com/ageron/handson-ml2/tree/master/"
HOUSING_PATH = os.path.join("datasets", "housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"
def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
    if not os.path.isdir(housing_path):
        os.makedirs(housing_path)
    # os.makedirs(housing_path, exist_ok=True)
    tgz_path = os.path.join(housing_path, "housing.tgz")
    urllib.request.urlretrieve(housing_url, tgz_path)
    housing_tgz = tarfile.open(tgz_path)
    housing_tgz.extractall(path=housing_path)
    housing_tgz.close()
fetch_housing_data()```


  [1]: https://stackoverflow.com/questions/46651490/tarfile-cant-open-tgz

Solution

  • The URL you are using to download the archive does not seem to be correct; you are downloading a HTML file, hence the error. The correct URL would be: https://raw.githubusercontent.com/ageron/handson-ml2/master/datasets/housing/housing.tgz - note the domain.