Search code examples
tensorflowmnistgoogle-cloud-ml

Can't load mnist dataset on google cloud ml


I want to run simple deep learning model for MNIST on google cloud ml. I try to download and unpack it via tensroflow's utility method tensorflow.examples.tutorials.mnist. Unfortunately, when I use it in the cloud, it's not visible to my code. I have exception like this: No such file or directory: 'gs://bucket/path/train-images-idx3-ubyte.gz When I browse the bucket, the file is there, but tensorflow doesn't see it.

What's wrong with it?


Solution

  • Unfortunately, TensorFlow's file system abstraction does not properly support Python's gzip library. As a result, mnist.read_data_sets only supports a train_dir on the local filesystem, i.e., you cannot use GCS with the utility functions.

    The workaround is to create a temporary directory on the local filesystem and use that instead.

    It seems to me that this is the default in the examples, e.g., mnist_softmax.py has a flag, --data-dir, that by default points to '/tmp/tensorflow/mnist/input_data'. To verify, I copied the contents of mnist_softmax.py into a new Python script which successfully ran on Cloud Machine Learning Engine. Also worked for mnist_deep.py

    However, if you're going to manually use read_data_sets (via tensorflow/examples/tutorials/mnist/input_data.py), be sure to pass a local directory as the first argument.