I would like to create an image generator with their labels. First, import data from csv and then map 43 classes using this code:
label_map = {v:i for i, v in enumerate(classes)}
The output will be something like this:
{'Danger': 4,
'Give Way': 5,
'Hump': 6,
'Left Bend': 7,
'Left Margin': 8,...}
Then will load images from directory using:
train_images = glob('/Desktop/dataset/resized_train/*')
Now I map the labels from csv file using:
train_labels = df['label'].map(label_map)
Now when I want to show each image with its corresponding label I cannot.
I used this code:
img = tf.io.read_file(image_path)
img = tf.image.decode_image(img, channels=3)
img.set_shape([None,None,3])
img = tf.image.resize(img, [image_w, image_h])
img = img/255.0
return img
def load_data(image_path, label):
image = read_img(image_path)
return image, label
def data_generator(features,labels):
dataset = tf.data.Dataset.from_tensor_slices((features,labels))
dataset = dataset.shuffle(buffer_size=100)
autotune = tf.data.experimental.AUTOTUNE
dataset = dataset.map(load_data, num_parallel_calls=autotune)
dataset = dataset.batch(batch_size=batch_size)
dataset = dataset.repeat()
dataset = dataset.prefetch(autotune)
return dataset
def show_img(dataset):
plt.figure(figsize=(15,15))
for i in range(8):
for val in dataset.take(1):
img = val[0][i]*255.0
plt.subplot(4,2,i+1)
plt.imshow(tf.cast(img,tf.uint8))
plt.title(val[1][i].numpy())
plt.subplots_adjust(hspace=1)
plt.show()
train_dataset = data_generator(train_images,train_labels)
val_dataset = data_generator(val_images,val_labels)
show_img(train_dataset)
When I run the show_img
it shows images, but the labels are all 0.
Your code seems to be working fine. I thought initially that your code was not working correctly because you were passing train_labels
as a pandas series to from_tensor_slices
, but that does not seem to be a problem. I can only imagine that the buffer_size
in dataset.shuffle
is too small. For example, if I set the buffer_size
to 1, I get the same samples every time I call dataset.take(1)
, because according to the docs:
[...] if your dataset contains 10,000 elements but buffer_size is set to 1,000, then shuffle will initially select a random element from only the first 1,000 elements in the buffer [...]
Maybe your first 100 elements have the label 0? Again, it's just a suggestion. I have managed to get your code to retrieve different labels each time by using a large buffer_size
:
import pandas as pd
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
classes = ['Danger', 'Give Way', 'Hump']
label_map = {v:i for i, v in enumerate(classes)}
d = {'label': ['Danger', 'Give Way', 'Hump', 'Danger', 'Give Way', 'Hump', 'Danger', 'Give Way', 'Hump'],
'other': [1, 2, 3, 4, 5, 6, 7, 8, 9]}
df = pd.DataFrame(data=d)
train_labels = df['label'].map(label_map)
def load_data(image, label):
image /= 255.0
return image, label
features = tf.random.normal((9, 32, 32, 3))
dataset = tf.data.Dataset.from_tensor_slices((features, train_labels))
dataset = dataset.shuffle(buffer_size=9)
dataset = dataset.map(load_data, num_parallel_calls=tf.data.experimental.AUTOTUNE)
dataset = dataset.batch(batch_size=2)
dataset = dataset.repeat()
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
plt.figure(figsize=(5,5))
for i in range(2):
for val in dataset.take(1):
img = val[0][i]*255.0
plt.subplot(1,2,i+1)
plt.imshow(tf.cast(img,tf.uint8))
plt.title(val[1][i].numpy())
plt.subplots_adjust(hspace=1)
plt.show()