Search code examples
tensorflowconv-neural-networkyolocheckpoint

How to add layers to a pre-trained model related to variables in the checkpoint?


I'm not used to using TensorFlow or nn so forgive me if I don't get what you guys say at the first time.

I'm currently trying to add one batch normalization layer after every convolutional layer in yolo v1 code that I got at the Internet.

Code below is the batch normalization function that I used.

def batchnorm(self, inp):
    with tf.variable_scope("batchnorm"):
        channels = inp.get_shape()[3]
        offset = tf.get_variable("offset",
                             channels,
                             dtype=tf.float32,
                             initializer=tf.zeros_initializer())
        scale = tf.get_variable("scale",
                            channels,
                            dtype=tf.float32,
                            initializer=tf.random_normal_initializer(1.0, 0.02))

        mean, variance = tf.nn.moments(inp, axes=[0, 1, 2], keep_dims=False)
        variance_epsilon = 1e-5
        normalized = tf.nn.batch_normalization(inp, mean, variance,
                                           offset, scale, variance_epsilon)
    return normalized

Code below is the structure of the yolov1 code that I'm using

 if self.verbose:
        print('Building Yolo Graph....')
    # Reset default graph
    tf.reset_default_graph()
    # Input placeholder
    self.x = tf.placeholder('float32', [None, 448, 448, 3])
    self.label_batch = tf.placeholder('float32', [None, 73])

    # conv1, pool1
    self.conv1 = self.conv_layer(1, self.x, 64, 7, 2)
    self.pool1 = self.maxpool_layer(2, self.conv1, 2, 2)
    # size reduced to 64x112x112
    # conv2, pool2
    self.conv2 = self.conv_layer(3, self.pool1, 192, 3, 1)
    self.pool2 = self.maxpool_layer(4, self.conv2, 2, 2)
    # size reduced to 192x56x56
    # conv3, conv4, conv5, conv6, pool3
    self.conv3 = self.conv_layer(5, self.pool2, 128, 1, 1)
    self.conv4 = self.conv_layer(6, self.conv3, 256, 3, 1)
    self.conv5 = self.conv_layer(7, self.conv4, 256, 1, 1)
    self.conv6 = self.conv_layer(8, self.conv5, 512, 3, 1)
    self.pool3 = self.maxpool_layer(9, self.conv6, 2, 2)
    # size reduced to 512x28x28
    # conv7 - conv16, pool4
    self.conv7 = self.conv_layer(10, self.pool3, 256, 1, 1)
    self.conv8 = self.conv_layer(11, self.conv7, 512, 3, 1)
    self.conv9 = self.conv_layer(12, self.conv8, 256, 1, 1)
    self.conv10 = self.conv_layer(13, self.conv9, 512, 3, 1)
    self.conv11 = self.conv_layer(14, self.conv10, 256, 1, 1)
    self.conv12 = self.conv_layer(15, self.conv11, 512, 3, 1)
    self.conv13 = self.conv_layer(16, self.conv12, 256, 1, 1)
    self.conv14 = self.conv_layer(17, self.conv13, 512, 3, 1)
    self.conv15 = self.conv_layer(18, self.conv14, 512, 1, 1)
    self.conv16 = self.conv_layer(19, self.conv15, 1024, 3, 1)
    self.pool4 = self.maxpool_layer(20, self.conv16, 2, 2)
    # size reduced to 1024x14x14
    # conv17 - conv24
    self.conv17 = self.conv_layer(21, self.pool4, 512, 1, 1)
    self.conv18 = self.conv_layer(22, self.conv17, 1024, 3, 1)
    self.conv19 = self.conv_layer(23, self.conv18, 512, 1, 1)
    self.conv20 = self.conv_layer(24, self.conv19, 1024, 3, 1)
    self.conv21 = self.conv_layer(25, self.conv20, 1024, 3, 1)
    self.conv22 = self.conv_layer(26, self.conv21, 1024, 3, 2)
    self.conv23 = self.conv_layer(27, self.conv22, 1024, 3, 1)
    self.conv24 = self.conv_layer(28, self.conv23, 1024, 3, 1)

    # size reduced to 1024x7x7
    # fc1, fc2, fc3
    self.fc1 = self.fc_layer(29, self.conv24, 512,
                             flatten=True, linear=False)
    self.fc2 = self.fc_layer(
        30, self.fc1, 4096, flatten=False, linear=False)
    self.fc3 = self.fc_layer(
        31, self.fc2, 1470, flatten=False, linear=True)

    varlist = self.print_tensors_in_checkpoint_file(file_name=self.weightFile, all_tensors=True, tensor_name=None)
    variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
    self.saver = tf.train.Saver(variables[:len(varlist)])

    self.loss = self.calculate_loss_function(self.fc3 , self.label_batch)

    self.sess = tf.Session()

    self.saver.restore(self.sess, self.weightFile)

    self.only_restore_conv20 = False
    if self.only_restore_conv20:
        after_20_initializer = [var.initializer for var in tf.global_variables()[3:]]
        self.sess.run(after_20_initializer)

    #exerpath = 'C:/Users/dml/PycharmProjects/YOLOv1-master/exer_ckpt/exer.ckpt'

    self.training = tf.train.MomentumOptimizer(momentum=0.5, learning_rate=1e-4).minimize(self.loss)

    Momentum_initializers = [var.initializer for var in tf.global_variables() if 'Momentum' in var.name]

    self.sess.run(Momentum_initializers)

And finally the error I'm getting after putting a batchnorm layer right after conv1 layer like

self.conv1 = self.conv_layer(1, self.x, 64, 7, 2)
    self.bn1 = self.batchnorm(self.conv1)
    self.pool1 = self.maxpool_layer(2, self.bn1, 2, 2)

Is

NotFoundError: Key batchnorm/offset not found in checkpoint
 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

After few days of struggle I found out that it's related to restoring weights in the checkpoint file. And because my batchnorm variable is not in the checkpoint file. But I can't find out how to make my code work.


Solution

  • You are right, the issue is that when you load a checkpoint TensorFlow wants to restore the values of all variables. It raises an error if some variable is not found in the checkpoint file.

    I guess your checkpoint file does not contain the variables in your new normalization layer. If so, this checkpoint is probably useless. The pre-trained variable values will likely give pretty bad results when used in a new networks structure (with your normalization layer after each conv layer).

    If you still want to try using the pre-trained weights from the checkpoint file, you will need to load the variable values from the checkpoint yourself. Assuming the variable names and shapes did not change, you should be able to use a version of optimistic_restore function in this gist. This gist shows an example of adding a layer after creating a checkpoint - your exact case.