Search code examples
tensorflowkerasdeep-learningconv-neural-networkface-recognition

The second epoch's initial loss is not consistently related to the first epoch's final loss. The loss and the accuracy remain constant every epoch


The second loss is not consistently related to the first epoch. After that, every initial loss always stays the same every epoch. And all these parameters stay the same. I have some background in deep learning, but this is my first time implementing my own model so I want to know what's going wrong with my model intuitively. The dataset is the cropped face with two classifications each having 300 pictures. I highly appreciate your help.

import tensorflow as tf
from tensorflow import keras
from IPython.display import Image
import matplotlib.pyplot as plt
from keras.layers import ActivityRegularization
from keras.layers import Dropout
from tensorflow.keras.preprocessing.image import ImageDataGenerator

image_generator = ImageDataGenerator(
    featurewise_center=False, samplewise_center=False,
    featurewise_std_normalization=False, samplewise_std_normalization=False,
    rotation_range=0, width_shift_range=0.0, height_shift_range=0.0,
    brightness_range=None, shear_range=0.0, zoom_range=0.0, channel_shift_range=0.0,
    horizontal_flip=False, vertical_flip=False, rescale=1./255
)

image = image_generator.flow_from_directory('./util/untitled folder',batch_size=938)

x, y = image.next()
x_train = x[:500]
y_train = y[:500]
x_test = x[500:600]
y_test = y[500:600]

train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(4)
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(4)

plt.imshow(x_train[0])

def convolutional_model(input_shape):

    input_img = tf.keras.Input(shape=input_shape)
    x = tf.keras.layers.Conv2D(64, (7,7), padding='same')(input_img)
    x = tf.keras.layers.BatchNormalization(axis=3)(x)
    x = tf.keras.layers.ReLU()(x)
    x = tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=1, padding='same')(x)
    x = Dropout(0.5)(x)
    x = tf.keras.layers.Conv2D(128, (3, 3), padding='same', strides=1)(x)
    x = tf.keras.layers.ReLU()(x)
    x = tf.keras.layers.MaxPool2D(pool_size=(2, 2), padding='same', strides=4)(x)
    x = tf.keras.layers.Flatten()(x)
    x = ActivityRegularization(0.1,0.2)(x)
    outputs = tf.keras.layers.Dense(2, activation='softmax')(x)


    model = tf.keras.Model(inputs=input_img, outputs=outputs)
    return model


conv_model = convolutional_model((256, 256, 3))
conv_model.compile(loss=keras.losses.categorical_crossentropy,
                   optimizer=keras.optimizers.SGD(lr=1),
                   metrics=['accuracy'])
conv_model.summary()

conv_model.fit(train_dataset,epochs=100, validation_data=test_dataset)


Epoch 1/100
    2021-12-23 15:06:22.165763: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
    2021-12-23 15:06:22.172255: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
    125/125 [==============================] - ETA: 0s - loss: 804.6805 - accuracy: 0.48602021-12-23 15:06:50.936870: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
    125/125 [==============================] - 35s 275ms/step - loss: 804.6805 - accuracy: 0.4860 - val_loss: 0.7197 - val_accuracy: 0.4980
    Epoch 2/100
    125/125 [==============================] - 34s 270ms/step - loss: 0.7360 - accuracy: 0.4820 - val_loss: 0.7197 - val_accuracy: 0.4980
    Epoch 3/100
    125/125 [==============================] - 34s 276ms/step - loss: 0.7360 - accuracy: 0.4820 - val_loss: 0.7197 - val_accuracy: 0.4980


    

Solution

  • As you have a constant loss + accuracy, it is highly likely that your network does not learn anything (since you have two classes, it always predicts one of them).

    The activation function, loss function and number of neurons on the last layer are correct.

    The problem is not related to they way you load the images, but to the learning rate which is 1. At such a high learning rate, it is impossible for the network to be able to learn anything.

    You should start with a much smaller learning rate, for example 0.0001 or 0.00001, and then try to debug the data-loading process if you still have poor performance.