Search code examples
pythontensorflowtensorflow-datasets

Using Cifar-10 dataset from tfds.load() correctly


I'm trying to use the Cifar-10 dataset to practice my CNN skills.

If I do this it's ok:

(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

But I was trying to use tfds.load() and I don't understand how to do it.

With this I download it,

train_ds, test_ds = tfds.load('cifar10', split=['train','test'])

Now I tried this but is not working,

assert isinstance(train_ds, tf.data.Dataset)
assert isinstance(test_ds, tf.data.Dataset)
(train_images, train_labels) = tuple(zip(*train_ds))
(test_images, test_labels) = tuple(zip(*test_ds))

Can somebody show me the way to achieve it?

thank you!


Solution

  • You can also extract them like this:

    train_ds, test_ds = tfds.load('cifar10', split=['train','test'], 
                                   as_supervised = True, 
                                   batch_size = -1)
    

    To work with as_numpy() method, you need pass as_supervised and batch_size as shown. If you pass as_supervised = True then the dataset will have tuple structure that (inputs, labels) otherwise it will be a dictionary.

    With them you simply call:

    train_images, train_labels = tfds.as_numpy(train_ds)
    

    Or another way is to iterate over it to obtain classes(assuming batch_size is not passed).

    With as_supervised = False:

    train_images, train_labels = [],[]
    
    for images_labels in train_ds:
        train_images.append(images_labels['image'])
        train_labels.append(images_labels['label'])
    

    With as_supervised = True:

    for images, labels in train_ds:
        train_images.append(images)
        train_labels.append(labels)