Search code examples
pythontensorflowkeraswatermarkimage-classification

Watermark binary classifier in Tensorflow Keras stuck


My goal is to create a model that can classify pictures depending if ONE particular watermark is present or not. If I would like to check a different watermark, ideally it would be create another dataset with that new watermark, and re-training the model. As I understand this is a binary classifier.

Is this the right approach?

I am stuck with my model to identify if a picture has a watermark on it or not. My metrics don't move from. Example:

loss: 0.6931 - accuracy: 0.5000 - val_loss: 0.6931 - val_accuracy: 0.5000

I have prepared a data folder structure like:

Training

  • Watermark
  • No_watermark

Validation

  • Watermark
  • No_watermark

I have used a dataset with 1000 images in each category. Here is an exaplample of my dataset with my own watermark:

I hope you can help with this....

  1. How can I change my model to "recognize" the watermark?
  2. Why do my "loss" and "accuracy" not move even if I change the image size, epochs, dataset?
  3. Should I just train the model with just the watermark image with augmentation and no background?
model = tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(250, 250, 3)),
        tf.keras.layers.MaxPooling2D(2, 2),
        tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
        tf.keras.layers.MaxPooling2D(2,2),
        tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
        tf.keras.layers.MaxPooling2D(2,2),
        tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
        tf.keras.layers.MaxPooling2D(2,2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(512, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(loss = 'binary_crossentropy',
                      optimizer='rmsprop',
                      metrics=['accuracy'])

history = model.fit(train_generator, 
                  epochs=25,
                  validation_data = validation_generator,
                  verbose = 1,
                  validation_steps=3)

Thanks


Solution

  • Since you're performing a binary classification, have you set the class_mode parameter in the ImageDataGenerator.flow_from_directory method to 'binary'? The default is 'categorical', which is not what you should be using here since you have a single output node.

    It's a common pitfall. I'm guessing the value of accuracy is 0.5 at the start because you likely have equal number of watermarked vs non-watermarked images, and the performance never improves because you've passed the wrong value of class_mode.

    TL;DR: Set class_mode='binary' (instead of the default class_mode='categorical') in flow_from_directory.