Search code examples
kerastransfer

Keras Transfer Learning on own model


I am new to Transfer Learning and I have some problems about how to set up the code for it. I also read other posts with similar questions but none of these helped me solving my issues.

I have trained a CNN from scratch on a large dataset I acquired on my own. For that, I saved only the weights in a hdf5 file.

Now, I want to use the same CNN architecture to build up a model that classifies other data, for which less data is available.

This is the architecture of the new model:

# Input
inputs = Input(shape =(200, 1), name = 'ip_inputs')

# Feature Extraction
conv1 = Conv1D(40, 3, kernel_initializer = 'he_normal', activation = 'relu', strides = 2, padding = 'same', name = 'ip_C1') (inputs)
batchnorm1 = BatchNormalization(name = 'ip_BN1') (conv1)


# Flatten
flatten = Flatten(name = 'ip_F') (batchnorm1)

# Classification
dense1 = Dense(300, activation = 'relu', kernel_initializer = 'he_normal', name = 'ip_FC1') (flatten)
dropout1 = Dropout(0.4, name =  'D1') (dense1)
dense2 = Dense(300, activation = 'relu', kernel_initializer = 'he_normal', name = 'ip_FC2') (dropout1)
dropout2 = Dropout(0.3, name = 'D2') (dense2)

predictions = Dense(16, activation='softmax', kernel_initializer = 'he_normal', name = 'ip_FC6') (dropout2)

# Model
model = Model(inputs=inputs, outputs=predictions)

The old model I trained from scratch had similar architecture but different Input and Output shape.

With

model.load_weights(weights_path, by_name = True)

I load the weights I saved previously.

However, I do not know how to do Transfer Learning properly. Can someone give some recommendations about the following questions:

  • For which layers should I load the weights? Only for the Conv Layer or also for others?
  • For which layers do I have to set trainable = false?

Thanks for any advise!


Solution

  • To use transfer learning with the same structure is equivalent to retrain your model on newer data.

    Having the exact same structure, I can't think of a better weights initialization than what you have stored from the previous training, so I would load all the weights as you are currently doing.

    The choice of amount of layers to be retrained should be based on your particular data.

    In one extreme, you can retrain the whole model (all layers). In this case, you would be adjusting your deeper understanig of your data to better suit the new dataset. The computational burden of that process would be high, as you would have to compute all gradients to update all parameters.

    In the other extreme, you could only retrain the last layer. In this case, trianing would be much faster (much fewer computations are involved) but your model is constraint to make sense of the data in a very similar way as it used to previously, which leads to a more superficial understanding of your new data.

    Usually, the best option is something in between but that always depends on your particular data.