Search code examples
machine-learningdatasettensorflowtraining-datamnist

How to convert (samesize, categoriezed) images into dataset for TensorFlow


I am learning to create a learning model using TensorFlow.

I have successfully run the MNIST tutorial, now would like to test the model with my own images. They are same-size image (224x224) and classified into folders.

Now I would like to use those images as input for my model as in the MNIST example. I tried to open the MNIST data-set but it's unreadable. I guess it has been converted into some binary types. Through the example, I think the MNIST dataset somehow has a structure like this:

  • mnist
    • test
      • images
      • labels
    • train
      • images
      • labels

How can I make a dataset look like the MNIST data from my own images files?

Thank you very much!


Solution

  • MNIST is not stored in image format. From the mnist web-site (http://yann.lecun.com/exdb/mnist/) you could see that it has specific format which is already close to the tensor or numpy array, which could be used in tensorflow with minimal adjustments. It is a kind of a matrix with numbers.

    What you need to work with usual images (.jpg for instance) is to use any python lib for image processing to convert into the np.array. For example PIL will work, like here: PIL and numpy

    Another option is to use a built-in functions from tensorflow to convert your images straight to tensors supported by tensofrlow, check this out: https://www.tensorflow.org/versions/r0.9/api_docs/python/image.html