Search code examples
pythontensorflowimage-processingtensorflow2.0tensorflow-datasets

how to build TensorFlow input pipelines for images and their coresponding label


I would like to create an image generator with their labels. First, import data from csv and then map 43 classes using this code:

label_map = {v:i for i, v in enumerate(classes)}

The output will be something like this:

{'Danger': 4,
 'Give Way': 5,
 'Hump': 6,
 'Left Bend': 7,
 'Left Margin': 8,...}

Then will load images from directory using:

train_images = glob('/Desktop/dataset/resized_train/*')

Now I map the labels from csv file using:

train_labels = df['label'].map(label_map)

Now when I want to show each image with its corresponding label I cannot.

I used this code:

    img = tf.io.read_file(image_path)
    img = tf.image.decode_image(img, channels=3)
    img.set_shape([None,None,3])
    img = tf.image.resize(img, [image_w, image_h])
    img  = img/255.0
    return img

def load_data(image_path, label):
    image = read_img(image_path)
    return image, label
def data_generator(features,labels):
    dataset = tf.data.Dataset.from_tensor_slices((features,labels))
    dataset = dataset.shuffle(buffer_size=100)
    autotune = tf.data.experimental.AUTOTUNE
    dataset = dataset.map(load_data, num_parallel_calls=autotune)
    dataset = dataset.batch(batch_size=batch_size)
    dataset = dataset.repeat()
    dataset = dataset.prefetch(autotune)
    return dataset

def show_img(dataset):
    plt.figure(figsize=(15,15))
    for i in range(8):
        for val in dataset.take(1):
            img  = val[0][i]*255.0
            plt.subplot(4,2,i+1)
            plt.imshow(tf.cast(img,tf.uint8))
            plt.title(val[1][i].numpy())
            plt.subplots_adjust(hspace=1)
    plt.show()

train_dataset = data_generator(train_images,train_labels)
val_dataset = data_generator(val_images,val_labels)
show_img(train_dataset)

When I run the show_img it shows images, but the labels are all 0.


Solution

  • Your code seems to be working fine. I thought initially that your code was not working correctly because you were passing train_labels as a pandas series to from_tensor_slices, but that does not seem to be a problem. I can only imagine that the buffer_size in dataset.shuffle is too small. For example, if I set the buffer_size to 1, I get the same samples every time I call dataset.take(1), because according to the docs:

    [...] if your dataset contains 10,000 elements but buffer_size is set to 1,000, then shuffle will initially select a random element from only the first 1,000 elements in the buffer [...]

    Maybe your first 100 elements have the label 0? Again, it's just a suggestion. I have managed to get your code to retrieve different labels each time by using a large buffer_size:

    import pandas as pd
    import tensorflow as tf
    import numpy as np
    import matplotlib.pyplot as plt
    
    classes = ['Danger', 'Give Way', 'Hump']
    label_map = {v:i for i, v in enumerate(classes)}
    d = {'label': ['Danger', 'Give Way', 'Hump', 'Danger', 'Give Way', 'Hump', 'Danger', 'Give Way', 'Hump'],
         'other': [1, 2, 3, 4, 5, 6, 7, 8, 9]}
    df = pd.DataFrame(data=d)
    
    train_labels = df['label'].map(label_map)
    
    def load_data(image, label):
        image /= 255.0
        return image, label
    
    features = tf.random.normal((9, 32, 32, 3))
    dataset = tf.data.Dataset.from_tensor_slices((features, train_labels))
    dataset = dataset.shuffle(buffer_size=9)
    dataset = dataset.map(load_data, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    dataset = dataset.batch(batch_size=2)
    dataset = dataset.repeat()
    dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
    
    plt.figure(figsize=(5,5))
    for i in range(2):
        for val in dataset.take(1):
            img  = val[0][i]*255.0
            plt.subplot(1,2,i+1)
            plt.imshow(tf.cast(img,tf.uint8))
            plt.title(val[1][i].numpy())
            plt.subplots_adjust(hspace=1)
    plt.show()
    

    enter image description here