Search code examples
kerasmnistkeras-2

Reasoning about MNIST database


I cannot download the Keras MNIST db the simple way due to proxy.

So I have downloaded a local version from here : https://s3.amazonaws.com/img-datasets/mnist.pkl.gz

I am importing that to my notebook with the following code :

import gzip
import pickle
f = gzip.open('mnist.pkl.gz', 'rb')
if sys.version_info < (3,):
    data = pickle.load(f)
else:
    data = pickle.load(f, encoding='bytes')
f.close()
print(data)
(X_train, y_train), (X_test, y_test) = data

but I'm not really sure how to play with it.

I am trying to print the shapes like so :

print(X_train.shape)
print(y_train.shape)

but this is giving the output :

(60000, 28, 28)
(60000,)

which doesn't really make sense to me. What does this actually mean? How can I print it more meaningfully?


Solution

  • The shape of your X_train means that you have 60.000 exemples of shape (28, 28), so basicly 60 000 images of size 28 by 28, and black and white because you don't have a third channel.

    For your y_train that means that you have 60.000 labels, so one label for each corresponding image.

    If you want to print an image to see what it's look like you can do this :
    (here the first image)

    plt.imshow(X_train[0, :, :], 'gray')
    plt.title("image label: "+ str(y_train[0]), fontsize=14)
    

    Is that more clear for you ?