I found out online this Sequence API (don't remember where, sorry):
class PlanetSequence(tf.keras.utils.Sequence):
"""
Custom Sequence object to train a model on out-of-memory datasets.
"""
def __init__(self, df_path, data_path, im_size, batch_size, mode='train'):
"""
df_path: path to a .csv file that contains columns with image names and labels
data_path: path that contains the training images
im_size: image size
mode: when in training mode, data will be shuffled between epochs
"""
self.df = pd.read_csv(df_path)
self.im_size = im_size
self.batch_size = batch_size
self.mode = mode
# Take labels and a list of image locations in memory
labelsEncoder = self.df['label'].values
self.labels = to_categorical(labelsEncoder, num_classes=11)
self.image_list = self.df['image'].apply(lambda x: os.path.join(data_path, x)).tolist()
def __len__(self):
return int(math.ceil(len(self.df) / float(self.batch_size)))
def on_epoch_end(self):
# Shuffles indexes after each epoch
self.indexes = range(len(self.image_list))
if self.mode == 'train':
self.indexes = random.sample(self.indexes, k=len(self.indexes))
def get_batch_labels(self, idx):
# Fetch a batch of labels
return self.labels[idx * self.batch_size : (idx + 1) * self.batch_size]
def get_batch_features(self, idx):
# Fetch a batch of images
batch_images = self.image_list[idx * self.batch_size : (1 + idx) * self.batch_size]
return np.array([load_image(im, self.im_size) for im in batch_images])
def __getitem__(self, idx):
batch_x = self.get_batch_features(idx)
batch_y = self.get_batch_labels(idx)
return batch_x, batch_y
And int the load_image function, we have this:
def load_image(image_path, size):
# data augmentation logic such as random rotations can be added here
return img_to_array(load_img(image_path, target_size=(size, size))) / 255.
It seems that I can use data augmentation there, but I can't figure it out how.
I thought about using DataImageGenerator from Keras and use flow to get images augmented, but I couldn't make this work.
What's the best approach to deal with it?
I have modified the answer quite a bit. I will try to fit the data generator in your code as well, mean while I would suggest to go this way to use image generator with basic house keeping of data.
firstly read train csv and import shutil
util to move you and align your folder as below mentioned structure :
import shutil
use this way quickly read each rows of csv and copy image in respective folders as per structure shutil.copy(path given in csv, <destination folder>)
this way read both csvs and use shutil to move your images in to hierarchy mentioned below, believe me it will take much lesser time to do data keeping. you can make multiple sub folders (depends on class) with in train and test folder.
|__ train
|______ planet: [contains images.]
|______ star: [contains images.]
|__ test
|______ Planets: [contains images]
|______ dogs: [images]
test_dir = os.path.join(PATH, 'test')
train_dir = os.path.join(PATH, 'train')
train_planets_dir = os.path.join(train_dir, 'planet') # directory with our planets images
train_stars_dir = os.path.join(train_dir, 'star') # directory with our training star images
# similarly for stars i.e. other class
test_planets_dir = os.path.join(test_dir, 'planet')
test_stars_dir = os.path.join(test_dir, 'star')
now call image generator with all the type of augmentations you needed as per your need (see arguments for different augmentations, enable all the required one)
train_data_gen = train_image_generator.flow_from_directory(batch_size=batch_size,
directory=train_dir,
shuffle=True,
target_size=(IMG_HEIGHT, IMG_WIDTH),
class_mode='binary')
see **train_dir** is the common path which contains all the sub class folders within
similarly for test.
test_data_gen = test_image_generator.flow_from_directory(batch_size=batch_size,
directory=test_dir,
target_size=(IMG_HEIGHT, IMG_WIDTH),
class_mode='binary')
this way, you will get the right way of keeping data and can use data generator efficiently, moreover this way will eventually handle the labeling automatically.
hope it helps a bit.