python deep-learning computer-vision image-segmentation semantic-segmentation

How to get better results in water segmentation?

this is my first computer vision project and I am still understanding the basics. I am using water segmentation dataset from kaggle and tried training the model with 1888 images. I wanted to perform semantic segmentation to segment the water parts from the images. The model that I used is U-net architecture. The model performs good on the test images and I am getting somewhat decent results, but when I try to get the prediction for a new image the result is very bad. I tried different pre-trained model but they perform worse. The model architecture is the following codes below, output images are also attached below. Does anyone know any better approach or what I am doing wrong here?

from keras.models import Model
from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, concatenate, Conv2DTranspose, BatchNormalization, Dropout, Lambda
from tensorflow.keras.optimizers import Adam
from keras.layers import Activation, MaxPool2D, Concatenate


def conv_block(input, num_filters):
    x = Conv2D(num_filters, 3, padding="same")(input)
    x = BatchNormalization()(x)    
    x = Activation("relu")(x)

    x = Conv2D(num_filters, 3, padding="same")(x)
    x = BatchNormalization()(x)  #Not in the original network
    x = Activation("relu")(x)

    return x


def encoder_block(input, num_filters):
    x = conv_block(input, num_filters)
    p = MaxPool2D((2, 2))(x)
    return x, p   


def decoder_block(input, skip_features, num_filters):
    x = Conv2DTranspose(num_filters, (2, 2), strides=2, padding="same")(input)
    x = Concatenate()([x, skip_features])
    x = conv_block(x, num_filters)
    return x


def build_unet(input_shape, n_classes):
    inputs = Input(input_shape)

    s1, p1 = encoder_block(inputs, 64)
    s2, p2 = encoder_block(p1, 128)
    s3, p3 = encoder_block(p2, 256)
    s4, p4 = encoder_block(p3, 512)

    b1 = conv_block(p4, 1024) #Bridge

    d1 = decoder_block(b1, s4, 512)
    d2 = decoder_block(d1, s3, 256)
    d3 = decoder_block(d2, s2, 128)
    d4 = decoder_block(d3, s1, 64)

    if n_classes == 1:  #Binary
      activation = 'sigmoid'
    else:
      activation = 'softmax'

    outputs = Conv2D(n_classes, 1, padding="same", activation=activation)(d4)  #Change the activation based on n_classes
    print(activation)

    model = Model(inputs, outputs, name="U-Net")
    return model

model = build_unet(input_shape, n_classes=1)
model.compile(optimizer=Adam(learning_rate = 1e-3), loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

This is the result on test data:
enter image description here

And this the result on new images:
enter image description here `

Solution

My guess is that you have some pre-processing done on your test data that is not performed on the new images. Specifically, I would look into image normalization: In most image processing neural networks input images are normalized to have (roughly) zero mean and unit variance. If this kind of normalization is part of your training/evaluation code you must have the same normalization also for the new images before processing them by the trained U-net.