Search code examples
pythontensorflowkerasdeep-learningoserror

Transfer Learning Trainable Model Throws Errors On saving


I have downloaded strong texta pretrained model, and im trying to transfer learn it. therefore I'm loading the model which is saved as a 'xray_model.h5' file, and set it as untrainable:

model = tf.keras.models.load_model('xray_model.h5')
model.trainable = False

later I take the start layer and end layer and build my addings on it:

base_input = model.layers[0].input
base_output = model.get_layer(name="flatten").output

base_output = build_model()(base_output)

new_model = keras.Model(inputs=base_input, outputs=base_output)

since I want to train my layers (and after some games, I realized that I might need to train the old layers too) I want to set the model as trainable:

for i in range(len(new_model.layers)):
    new_model._layers[i].trainable = True

BUT, when I start training it, with the callback:

METRICS = ['accuracy',
           tf.keras.metrics.Precision(name='precision'),
           tf.keras.metrics.Recall(name='recall'),
           lr_metric]

reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.00001, verbose=1)

save_callback = tf.keras.callbacks.ModelCheckpoint("new_xray_model.h5",
                                                   save_best_only=True,
                                                   monitor='accuracy')
history = new_model.fit(train_generator,
                        verbose=1,
                        steps_per_epoch=BATCH_SIZE,
                        epochs=EPOCHS,
                        validation_data=test_generator,
                        callbacks=[save_callback, reduce_lr])

I get the next error:

File "C:\Users\jm10o\AppData\Local\Programs\Python\Python38\lib\site-packages\h5py\_hl\group.py", line 373, in __setitem__
    h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py\h5o.pyx", line 202, in h5py.h5o.link
OSError: Unable to create link (name already exists)

Process finished with exit code 1

I noticed that it happens only when I'm trying to further train the model which I loaded. I couldn't find any solution for it.


Solution

  • The problem came from the Model_checkpoint callback. for each epoch, you save the model with the same name.

    use the following format

    ModelCheckpoint('your_model_name{epoch:0d}.h5',
                        monitor='accuracy')