Search code examples
pythontensorflowkeras

Adding regularizer to an existing layer of a trained model without resetting weights?


Let's say I'm transfer learning via Inception. I add a few layers and train it for a while.

Here is what my model topology looks like:

base_model = InceptionV3(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu', name = 'Dense_1')(x)
predictions = Dense(12, activation='softmax', name = 'Predictions')(x)
model = Model(input=base_model.input, output=predictions)

I train this model for a while, save it and load it again for retraining; this time I want to add l2-regularizer to the Dense_1 without resetting the weights? Is this possible?

path = .\model.hdf5
from keras.models import load_model
model = load_model(path)

The docs show only show the that regularizer can be added as parameter when you initialize a layer:

from keras import regularizers
model.add(Dense(64, input_dim=64,
                kernel_regularizer=regularizers.l2(0.01),
                activity_regularizer=regularizers.l1(0.01)))

This is essentially creating a new layer, so my layer's weights would be resetted.

EDIT:

So I'm playing around with the code the past couple of days, and something strange is happening with my loss when I load the model (after training a bit with the new regularizer).

So the first time I run this code (first time with new regularizer):

from keras.models import load_model
base_model = load_model(path)
x = base_model.get_layer('dense_1').output
predictions = base_model.get_layer('dense_2')(x)
model = Model(inputs = base_model.input, output = predictions)
model.get_layer('dense_1').kernel_regularizer = regularizers.l2(0.02) 

model.compile(optimizer=SGD(lr= .0001, momentum=0.90),
              loss='categorical_crossentropy',
              metrics = ['accuracy'])

My training output seems to be normal:

Epoch 43/50
 - 2918s - loss: 0.3834 - acc: 0.8861 - val_loss: 0.4253 - val_acc: 0.8723
Epoch 44/50
Epoch 00044: saving model to E:\Keras Models\testing_3\2018-01-18_44.hdf5
 - 2692s - loss: 0.3781 - acc: 0.8869 - val_loss: 0.4217 - val_acc: 0.8729
Epoch 45/50
 - 2690s - loss: 0.3724 - acc: 0.8884 - val_loss: 0.4169 - val_acc: 0.8748
Epoch 46/50
Epoch 00046: saving model to E:\Keras Models\testing_3\2018-01-18_46.hdf5
 - 2684s - loss: 0.3688 - acc: 0.8896 - val_loss: 0.4137 - val_acc: 0.8748
Epoch 47/50
 - 2665s - loss: 0.3626 - acc: 0.8908 - val_loss: 0.4097 - val_acc: 0.8763
Epoch 48/50
Epoch 00048: saving model to E:\Keras Models\testing_3\2018-01-18_48.hdf5
 - 2681s - loss: 0.3586 - acc: 0.8924 - val_loss: 0.4069 - val_acc: 0.8767
Epoch 49/50
 - 2679s - loss: 0.3549 - acc: 0.8930 - val_loss: 0.4031 - val_acc: 0.8776
Epoch 50/50
Epoch 00050: saving model to E:\Keras Models\testing_3\2018-01-18_50.hdf5
 - 2680s - loss: 0.3493 - acc: 0.8950 - val_loss: 0.4004 - val_acc: 0.8787

However, if I try to load the model after this mini-training session(I will load the model from epoch 00050, so new regularizer value should be already implemented, I get a really high loss value)

Code:

path = r'E:\Keras Models\testing_3\2018-01-18_50.hdf5' #50th epoch model

from keras.models import load_model
model = load_model(path)
model.compile(optimizer=SGD(lr= .0001, momentum=0.90),
              loss='categorical_crossentropy',
              metrics = ['accuracy'])

return:

Epoch 51/65
 - 3130s - loss: 14.0017 - acc: 0.8953 - val_loss: 13.9529 - val_acc: 0.8800
Epoch 52/65
Epoch 00052: saving model to E:\Keras Models\testing_3\2018-01-20_52.hdf5
 - 2813s - loss: 13.8017 - acc: 0.8969 - val_loss: 13.7553 - val_acc: 0.8812
Epoch 53/65
 - 2759s - loss: 13.6070 - acc: 0.8977 - val_loss: 13.5609 - val_acc: 0.8824
Epoch 54/65
Epoch 00054: saving model to E:\Keras Models\testing_3\2018-01-20_54.hdf5
 - 2748s - loss: 13.4115 - acc: 0.8992 - val_loss: 13.3697 - val_acc: 0.8824
Epoch 55/65
 - 2745s - loss: 13.2217 - acc: 0.9006 - val_loss: 13.1807 - val_acc: 0.8840
Epoch 56/65
Epoch 00056: saving model to E:\Keras Models\testing_3\2018-01-20_56.hdf5
 - 2752s - loss: 13.0335 - acc: 0.9014 - val_loss: 12.9951 - val_acc: 0.8840
Epoch 57/65
 - 2756s - loss: 12.8490 - acc: 0.9023 - val_loss: 12.8118 - val_acc: 0.8849
Epoch 58/65
Epoch 00058: saving model to E:\Keras Models\testing_3\2018-01-20_58.hdf5
 - 2749s - loss: 12.6671 - acc: 0.9032 - val_loss: 12.6308 - val_acc: 0.8849
Epoch 59/65
 - 2738s - loss: 12.4871 - acc: 0.9039 - val_loss: 12.4537 - val_acc: 0.8855
Epoch 60/65
Epoch 00060: saving model to E:\Keras Models\testing_3\2018-01-20_60.hdf5
 - 2765s - loss: 12.3086 - acc: 0.9059 - val_loss: 12.2778 - val_acc: 0.8868
Epoch 61/65
 - 2767s - loss: 12.1353 - acc: 0.9065 - val_loss: 12.1055 - val_acc: 0.8867
Epoch 62/65
Epoch 00062: saving model to E:\Keras Models\testing_3\2018-01-20_62.hdf5
 - 2757s - loss: 11.9637 - acc: 0.9061 - val_loss: 11.9351 - val_acc: 0.8883

Notice the really high loss values. Is this normal? I understand the l2 regularizer would bring the loss up (if there large weights), but wouldn't that be reflected in the first mini-training session (where I first implemented the regularizer?). The accuracy seems to stay consistent though.

Thank you.


Solution

  • The solution from Marcin hasn't worked for me. As apatsekin mentioned, if you print layer.losses after adding the regularizers as Marcin proposed, you will get an empty list.

    I found a workaround that I do not like at all, but I am posting here so someone more capable can find a way to do this in an easier way.

    I believe it works for most keras.application networks. I copied the .py file of a specific architecture from keras-application in Github (for example, InceptionResNetV2) to a local file regularizedNetwork.py in my machine. I had to edit it to fix some relative imports such as:

    #old version
    from . import imagenet_utils
    from .imagenet_utils import decode_predictions
    from .imagenet_utils import _obtain_input_shape
    
    backend = None
    layers = None
    models = None
    keras_utils = None
    

    to:

    #new version
    from keras import backend
    from keras import layers
    from keras import models
    from keras import utils as keras_utils
    
    from keras.applications import imagenet_utils
    from keras.applications.imagenet_utils import decode_predictions
    from keras.applications.imagenet_utils import _obtain_input_shape
    

    Once the relative paths and import issues were solved, I added the regularizers in each desired layer, just as you would do when defining a new untrained network. Usually, after defining the architecture, the models from keras.application load the pre-trained weights.

    Now, in your main code/notebook, just import the new regularizedNetwork.py and call the main method to instantiate the network.

    #main code
    from regularizedNetwork import InceptionResNetV2
    

    The regularizers should be all set and you can fine-tune the regularized model normally.

    I am certain there is a less gimmicky way of doing this, so, please, if someone finds it, write a new answer and/or comment in this answer.

    Just for the record, I also tried instantiating the model from keras.application, getting the its architecture with regModel = model.get_config(), adding the regularizers as Marcin suggested and then loading the weights with regModel.set_weights(model.get_weights()), but it still didn't work.

    Edit: spelling errors.