python tensorflow keras deep-learning transfer-learning

Transfer learning: model is giving unchanged loss results. Is it not training?

I'm trying to train a Regression Model on Inception V3. Inputs are images of size (96,320,3). There are a total of 16k+ images out of which 12k+ are for training and the rest for validation. I have frozen all layers in Inception, but unfreezing them does not help either (already tried). I've replaced the top of the pre-trained model with a few layers as indicated in the code below.

X_train = preprocess_input(X_train)
inception = InceptionV3(weights='imagenet', include_top=False, input_shape=(299,299,3))
inception.trainable = False
print(inception.summary())

driving_input = Input(shape=(96,320,3))
resized_input = Lambda(lambda image: tf.image.resize(image,(299,299)))(driving_input)
inp = inception(resized_input)

x = GlobalAveragePooling2D()(inp)

x = Dense(512, activation = 'relu')(x)
x = Dense(256, activation = 'relu')(x)
x = Dropout(0.25)(x)
x = Dense(128, activation = 'relu')(x)
x = Dense(64, activation = 'relu')(x)
x = Dropout(0.25)(x)
result = Dense(1, activation = 'relu')(x)

lr_schedule = ExponentialDecay(initial_learning_rate=0.1, decay_steps=100000, decay_rate=0.95)
optimizer = Adam(learning_rate=lr_schedule)
loss = Huber(delta=0.5, reduction="auto", name="huber_loss")
model = Model(inputs = driving_input, outputs = result)
model.compile(optimizer=optimizer, loss=loss)

checkpoint = ModelCheckpoint(filepath="./ckpts/model.h5", monitor='val_loss', save_best_only=True)
stopper = EarlyStopping(monitor='val_loss', min_delta=0.0003, patience = 10)

batch_size = 32
epochs = 100

model.fit(x=X_train, y=y_train, shuffle=True, validation_split=0.2, epochs=epochs, 
          batch_size=batch_size, verbose=1, callbacks=[checkpoint, stopper])

This results in this:

Why is my model not training, and what can I do to fix it?

Solution

As your problem is a regression problem, the activation of the last layer should be linear instead of relu. And also the learning rate is too high, you should consider to lower it according to your overall set up. Here I'm showing a code sample with MNIST.

# data 
(xtrain, train_target), (xtest, test_target) = tf.keras.datasets.mnist.load_data()
# train_x, MNIST is gray scale, so in order to use it in pretrained weights , extending it to 3 axix
x_train = np.expand_dims(xtrain, axis=-1)
x_train = np.repeat(x_train, 3, axis=-1)
x_train = x_train.astype('float32') / 255
# prepare the label for regression model 
ytrain4 = tf.square(tf.cast(train_target, tf.float32))

# base model 
inception = InceptionV3(weights='imagenet', include_top=False, input_shape=(75,75,3))
inception.trainable = False

# inputs layer
driving_input = tf.keras.layers.Input(shape=(28,28,3))
resized_input = tf.keras.layers.Lambda(lambda image: tf.image.resize(image,(75,75)))(driving_input)
inp = inception(resized_input)

# top model 
x = GlobalAveragePooling2D()(inp)
x = Dense(512, activation = 'relu')(x)
x = Dense(256, activation = 'relu')(x)
x = Dropout(0.25)(x)
x = Dense(128, activation = 'relu')(x)
x = Dense(64, activation = 'relu')(x)
x = Dropout(0.25)(x)
result = Dense(1, activation = 'linear')(x)

# hyper-param
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(initial_learning_rate=0.0001, 
                                                             decay_steps=100000, decay_rate=0.95)
optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule)
loss = tf.keras.losses.Huber(delta=0.5, reduction="auto", name="huber_loss")

# build models
model = tf.keras.Model(inputs = driving_input, outputs = result)
model.compile(optimizer=optimizer, loss=loss)

# callbacks
checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath="./ckpts/model.h5", monitor='val_loss', save_best_only=True)
stopper = tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0.0003, patience = 10)

batch_size = 32
epochs = 10

# fit 
model.fit(x=x_train, y=ytrain4, shuffle=True, validation_split=0.2, epochs=epochs, 
          batch_size=batch_size, verbose=1, callbacks=[checkpoint, stopper])

Output

1500/1500 [==============================] - 27s 18ms/step - loss: 5.2239 - val_loss: 3.6060
Epoch 2/10
1500/1500 [==============================] - 26s 17ms/step - loss: 3.5634 - val_loss: 2.9022
Epoch 3/10
1500/1500 [==============================] - 26s 17ms/step - loss: 3.0629 - val_loss: 2.5063
Epoch 4/10
1500/1500 [==============================] - 26s 17ms/step - loss: 2.7615 - val_loss: 2.3764
Epoch 5/10
1500/1500 [==============================] - 26s 17ms/step - loss: 2.5371 - val_loss: 2.1303
Epoch 6/10
1500/1500 [==============================] - 26s 17ms/step - loss: 2.3848 - val_loss: 2.1373
Epoch 7/10
1500/1500 [==============================] - 26s 17ms/step - loss: 2.2653 - val_loss: 1.9039
Epoch 8/10
1500/1500 [==============================] - 26s 17ms/step - loss: 2.1581 - val_loss: 1.9087
Epoch 9/10
1500/1500 [==============================] - 26s 17ms/step - loss: 2.0518 - val_loss: 1.7193
Epoch 10/10
1500/1500 [==============================] - 26s 17ms/step - loss: 1.9699 - val_loss: 1.8837