tensorflow google-colaboratory tensorflow2.0

How to improve model to prevent overfitting for very simple image classification

First of all: I'm a beginner with TensorFlow (version2). I'm learning a lot by reading. However, I don't seem to find an answer to the following problem.

I'm trying to build a model for classifying images into three labels. As you see in the graphs below my training accurary is quite ok, but the validation accuracy is way too low.

As I understand, this is probably an 'overfitting' problem.

Maybe I'll first explain what I'm trying to do:

I want to use images as input. As output I'd want to receive zero or more labels (classifiers) that belong to those images. I was expecting this would be an easy task, since the input images are simple. (only two colors, and only 0, 1, 2 or 3 possible 'labels'. Here are a few examples of the images. They are a representation of a walked track (green) on a field (bounded by blue polygon):

Possible labels are:

cross: (first 2 images): you can see clearly that the green lines are forming one or more 'crosses'
zig-zag: (third image): not exactly sure if this the is correct term in English, but I guess you get the picture ;-)
rows: the green lines are mostly parallel lines (no zigzag, nor cross)
none of above (don't know if this needs to be a label)

I'm using following model:

batch_size = 128
epochs = 30
IMG_HEIGHT = 150
IMG_WIDTH = 150

model = Sequential([
    Conv2D(16, 3, padding='same', activation='relu', 
           input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
    MaxPooling2D(),
    Dropout(0.2),
    Conv2D(32, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Conv2D(64, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Dropout(0.2),
    Flatten(),
    Dense(512, activation='relu'),
    Dense(1, activation='sigmoid')
])



model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])


model.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_15 (Conv2D)           (None, 150, 150, 16)      448       
_________________________________________________________________
max_pooling2d_15 (MaxPooling (None, 75, 75, 16)        0         
_________________________________________________________________
dropout_10 (Dropout)         (None, 75, 75, 16)        0         
_________________________________________________________________
conv2d_16 (Conv2D)           (None, 75, 75, 32)        4640      
_________________________________________________________________
max_pooling2d_16 (MaxPooling (None, 37, 37, 32)        0         
_________________________________________________________________
conv2d_17 (Conv2D)           (None, 37, 37, 64)        18496     
_________________________________________________________________
max_pooling2d_17 (MaxPooling (None, 18, 18, 64)        0         
_________________________________________________________________
dropout_11 (Dropout)         (None, 18, 18, 64)        0         
_________________________________________________________________
flatten_5 (Flatten)          (None, 20736)             0         
_________________________________________________________________
dense_10 (Dense)             (None, 512)               10617344  
_________________________________________________________________
dense_11 (Dense)             (None, 3)                 1539      
=================================================================
Total params: 10,642,467
Trainable params: 10,642,467
Non-trainable params: 0

I'm using 3360 images as training dataset and 496 as validation dataset. Those are already 'augmented' so those sets contain already rotated and mirrored versions of other existing images.

Maybe it is worth to mention that the dataset is unbalanced: 80% of the images do contain the label 'cross', while the other 20% is covered by 'zig-zag' and 'rows'.

Anybody can guide me in the right direction how I can improve my model?

Solution

You want the network to output 3 possible labels so the last layer in your model should be able to do that. In practice you can change it to Dense(3, activation='sigmoid').

I don't know why it doesn't give you any error during training but you should also check the way you are feeding inputs and labels to the network.