Search code examples
pythontensorflowkerasdeep-learningtransfer-learning

Transfer learning: model is giving unchanged loss results. Is it not training?


I'm trying to train a Regression Model on Inception V3. Inputs are images of size (96,320,3). There are a total of 16k+ images out of which 12k+ are for training and the rest for validation. I have frozen all layers in Inception, but unfreezing them does not help either (already tried). I've replaced the top of the pre-trained model with a few layers as indicated in the code below.

X_train = preprocess_input(X_train)
inception = InceptionV3(weights='imagenet', include_top=False, input_shape=(299,299,3))
inception.trainable = False
print(inception.summary())

driving_input = Input(shape=(96,320,3))
resized_input = Lambda(lambda image: tf.image.resize(image,(299,299)))(driving_input)
inp = inception(resized_input)

x = GlobalAveragePooling2D()(inp)

x = Dense(512, activation = 'relu')(x)
x = Dense(256, activation = 'relu')(x)
x = Dropout(0.25)(x)
x = Dense(128, activation = 'relu')(x)
x = Dense(64, activation = 'relu')(x)
x = Dropout(0.25)(x)
result = Dense(1, activation = 'relu')(x)

lr_schedule = ExponentialDecay(initial_learning_rate=0.1, decay_steps=100000, decay_rate=0.95)
optimizer = Adam(learning_rate=lr_schedule)
loss = Huber(delta=0.5, reduction="auto", name="huber_loss")
model = Model(inputs = driving_input, outputs = result)
model.compile(optimizer=optimizer, loss=loss)

checkpoint = ModelCheckpoint(filepath="./ckpts/model.h5", monitor='val_loss', save_best_only=True)
stopper = EarlyStopping(monitor='val_loss', min_delta=0.0003, patience = 10)

batch_size = 32
epochs = 100

model.fit(x=X_train, y=y_train, shuffle=True, validation_split=0.2, epochs=epochs, 
          batch_size=batch_size, verbose=1, callbacks=[checkpoint, stopper])

This results in this: enter image description here

Why is my model not training, and what can I do to fix it?


Solution

  • As your problem is a regression problem, the activation of the last layer should be linear instead of relu. And also the learning rate is too high, you should consider to lower it according to your overall set up. Here I'm showing a code sample with MNIST.

    # data 
    (xtrain, train_target), (xtest, test_target) = tf.keras.datasets.mnist.load_data()
    # train_x, MNIST is gray scale, so in order to use it in pretrained weights , extending it to 3 axix
    x_train = np.expand_dims(xtrain, axis=-1)
    x_train = np.repeat(x_train, 3, axis=-1)
    x_train = x_train.astype('float32') / 255
    # prepare the label for regression model 
    ytrain4 = tf.square(tf.cast(train_target, tf.float32))
    
    # base model 
    inception = InceptionV3(weights='imagenet', include_top=False, input_shape=(75,75,3))
    inception.trainable = False
    
    # inputs layer
    driving_input = tf.keras.layers.Input(shape=(28,28,3))
    resized_input = tf.keras.layers.Lambda(lambda image: tf.image.resize(image,(75,75)))(driving_input)
    inp = inception(resized_input)
    
    # top model 
    x = GlobalAveragePooling2D()(inp)
    x = Dense(512, activation = 'relu')(x)
    x = Dense(256, activation = 'relu')(x)
    x = Dropout(0.25)(x)
    x = Dense(128, activation = 'relu')(x)
    x = Dense(64, activation = 'relu')(x)
    x = Dropout(0.25)(x)
    result = Dense(1, activation = 'linear')(x)
    
    # hyper-param
    lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(initial_learning_rate=0.0001, 
                                                                 decay_steps=100000, decay_rate=0.95)
    optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule)
    loss = tf.keras.losses.Huber(delta=0.5, reduction="auto", name="huber_loss")
    
    # build models
    model = tf.keras.Model(inputs = driving_input, outputs = result)
    model.compile(optimizer=optimizer, loss=loss)
    
    # callbacks
    checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath="./ckpts/model.h5", monitor='val_loss', save_best_only=True)
    stopper = tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0.0003, patience = 10)
    
    batch_size = 32
    epochs = 10
    
    # fit 
    model.fit(x=x_train, y=ytrain4, shuffle=True, validation_split=0.2, epochs=epochs, 
              batch_size=batch_size, verbose=1, callbacks=[checkpoint, stopper])
    

    Output

    1500/1500 [==============================] - 27s 18ms/step - loss: 5.2239 - val_loss: 3.6060
    Epoch 2/10
    1500/1500 [==============================] - 26s 17ms/step - loss: 3.5634 - val_loss: 2.9022
    Epoch 3/10
    1500/1500 [==============================] - 26s 17ms/step - loss: 3.0629 - val_loss: 2.5063
    Epoch 4/10
    1500/1500 [==============================] - 26s 17ms/step - loss: 2.7615 - val_loss: 2.3764
    Epoch 5/10
    1500/1500 [==============================] - 26s 17ms/step - loss: 2.5371 - val_loss: 2.1303
    Epoch 6/10
    1500/1500 [==============================] - 26s 17ms/step - loss: 2.3848 - val_loss: 2.1373
    Epoch 7/10
    1500/1500 [==============================] - 26s 17ms/step - loss: 2.2653 - val_loss: 1.9039
    Epoch 8/10
    1500/1500 [==============================] - 26s 17ms/step - loss: 2.1581 - val_loss: 1.9087
    Epoch 9/10
    1500/1500 [==============================] - 26s 17ms/step - loss: 2.0518 - val_loss: 1.7193
    Epoch 10/10
    1500/1500 [==============================] - 26s 17ms/step - loss: 1.9699 - val_loss: 1.8837