python keras semantic-segmentation image-preprocessing

How should image preprocessing and data augmentation be for semantic segmentation?

I have an imbalanced and small dataset which contains 4116 224x224x3 (RGB) aerial images. It's very likely that I will encounter the overfitting problem since the dataset is not big enough. Image preprocessing and data augmentation help to tackle this problem as explained below.

"Overfitting is caused by having too few samples to learn from, rendering you unable to train a model that can generalize to new data. Given infinite data, your model would be exposed to every possible aspect of the data distribution at hand: you would never overfit. Data augmentation takes the approach of generating more training data from existing training samples, by augmenting the samples via a number of random transformations that yield believable-looking images."

Deep Learning with Python by François Chollet, page 138-139, 5.2.5 Using data augmentation.

I've read Medium - Image Data Preprocessing for Neural Networks and examined Stanford's CS230 - Data Preprocessing and CS231 - Data Preprocessing courses. It is highlighted once more in SO question and I understand that there is no "one fits all" solution. Here is what forced me to ask this question:

"No translation augmentation was used since we want to achieve high spatial resolution."

Reference: Researchgate - Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks

I know that I will use Keras - ImageDataGenerator Class, but don't know which techniques and what parameters to use for the semantic segmentation on small objects task. Could someone enlighten me? Thanks in advance. :)

from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rotation_range=20,      # is a value in degrees (0–180)
    width_shift_range=0.2,  # is a range within which to randomly translate pictures horizontally.
    height_shift_range=0.2, # is a range within which to randomly translate pictures vertically.
    shear_range=0.2,        # is for randomly applying shearing transformations.
    zoom_range=0.2,         # is for randomly zooming inside pictures.
    horizontal_flip=True,   # is for randomly flipping half the images horizontally
    fill_mode='nearest',    # is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift
    featurewise_center=True,
    featurewise_std_normalization=True)

datagen.fit(X_train)

Solution

The augmentation and preprocessing phases are always depending on the problem that you have. You have to think of all the possible augmentation which can enlarge your dataset. But the most important thing is, that you should not perform extreme augmentations, which makes new training samples in the way which can not happen in real examples. If you do not expect that the real examples will be horizontally flipped do not perform horizontal flip, since this will give your model false information. Think of all the possible changes that can happen in your input images and try to artificially produce new images from your existing one. You can use a lot of built-in functions from Keras. But you should be aware of each that it will not make new examples which are not likely to be present on the input of your model.

As you said, there is no "one fits all" solution, because everything is dependent on the data. Analyse the data and build everything with respect to it.

About the small objects - one direction which you should check are the loss functions which emphasise the impact of target volumes in comparison to the background. Look at the Dice Loss or Generalised Dice Loss.