Search code examples
machine-learningdeep-learningneural-networkclassificationmnist

MNIST - problem with mnist.train_images() - HTTPError: Forbidden


I'm currently learning about neural networks and I want to use the train_images() function, but I'm unable to do so. If I run the following code:

import mnist

images = mnist.train_images()

, I'll get:

runfile('C:/Users/deriv/untitled0.py', wdir='C:/Users/deriv')
Traceback (most recent call last):

  File ~\anaconda3\Lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File c:\users\deriv\untitled0.py:3
    images = mnist.train_images()

  File ~\anaconda3\Lib\site-packages\mnist\__init__.py:161 in train_images
    return download_and_parse_mnist_file('train-images-idx3-ubyte.gz')

  File ~\anaconda3\Lib\site-packages\mnist\__init__.py:143 in download_and_parse_mnist_file
    fname = download_file(fname, target_dir=target_dir, force=force)

  File ~\anaconda3\Lib\site-packages\mnist\__init__.py:59 in download_file
    urlretrieve(url, target_fname)

  File ~\anaconda3\Lib\urllib\request.py:241 in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:

  File ~\anaconda3\Lib\urllib\request.py:216 in urlopen
    return opener.open(url, data, timeout)

  File ~\anaconda3\Lib\urllib\request.py:525 in open
    response = meth(req, response)

  File ~\anaconda3\Lib\urllib\request.py:634 in http_response
    response = self.parent.error(

  File ~\anaconda3\Lib\urllib\request.py:563 in error
    return self._call_chain(*args)

  File ~\anaconda3\Lib\urllib\request.py:496 in _call_chain
    result = func(*args)

  File ~\anaconda3\Lib\urllib\request.py:643 in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: Forbidden

I installed mnist correctly using pip install but, I don't know why** mnist.train_images()** causes the error. Sorry if this is a simple question but, it will help me a lot.

I don't know wheter or not I'm supposed to download files straightforward from http://yann.lecun.com/exdb/mnist/. However I'm not able to do so because I don't have a permission to access this resources.


Solution

  • It seems indeed that the web server is misconfigured: http://yann.lecun.com/exdb/mnist/. As this dataset is built-in in many standard libraries like keras (see this tutorial), it is not so frequently downloaded from the "lecun url" I think.

    In the source (mnist/__init__.py) there is a comment:

    # `datasets_url` and `temporary_dir` can be set by the user using:
    # >>> mnist.datasets_url = 'http://my.mnist.url'
    # >>> mnist.temporary_dir = lambda: '/tmp/mnist'
    datasets_url = 'http://yann.lecun.com/exdb/mnist/'
    temporary_dir = tempfile.gettempdir
    

    So theoretically, you could set the mnist.datasets_url variable for a mirror and it should work. The only mirror I found with the original format is this: https://github.com/mkolod/MNIST. But this is https, and it did not work for me.

    So instead you can manually download the data from the GitHub mirror into the temp directory shown by this code:

    import temp file
    tempfile.gettempdir()
    

    And then mnist.train_images() should work.