Search code examples
tensorflowtf-slim

TFSlim - problems loading saved checkpoint for VGG16


(1) I'm trying to fine-tune a VGG-16 network using TFSlim by loading pretrained weights into all layers except thefc8 layer. I achieved this by using the TF-SLIm function as follows:

import tensorflow as tf
import tensorflow.contrib.slim as slim
import tensorflow.contrib.slim.nets as nets

vgg = nets.vgg

# Specify where the Model, trained on ImageNet, was saved.
model_path = 'path/to/vgg_16.ckpt'

# Specify where the new model will live:
log_dir = 'path/to/log/'

images = tf.placeholder(tf.float32, [None, 224, 224, 3])
predictions = vgg.vgg_16(images)

variables_to_restore = slim.get_variables_to_restore(exclude=['fc8'])
restorer = tf.train.Saver(variables_to_restore)




init = tf.initialize_all_variables()

with tf.Session() as sess:
   sess.run(init)
   restorer.restore(sess,model_path)
   print "model restored"

This works fine as long as I do not change the num_classes for the VGG16 model. What I would like to do is to change the num_classes from 1000 to 200. I was under the impression that if I did this modification by defining a new vgg16-modified class that replaces the fc8 to produce 200 outputs, (along with a variables_to_restore = slim.get_variables_to_restore(exclude=['fc8']) that everything will be fine and dandy. However, tensorflow complains of a dimensions mismatch:

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1,1,4096,200] rhs shape= [1,1,4096,1000] 

So, how does one really go about doing this ? The documentation for TFSlim is really patchy and there are several versions scattered on Github - so not getting much help there.


Solution

  • You can try using slim's way of restoring — slim.assign_from_checkpoint.

    There is related documentation in the slim sources: https://github.com/tensorflow/tensorflow/blob/129665119ea60640f7ed921f36db9b5c23455224/tensorflow/contrib/slim/python/slim/learning.py

    Corresponding part:

    *************************************************
    * Fine-Tuning Part of a model from a checkpoint *
    *************************************************
    Rather than initializing all of the weights of a given model, we sometimes
    only want to restore some of the weights from a checkpoint. To do this, one
    need only filter those variables to initialize as follows:
      ...
      # Create the train_op
      train_op = slim.learning.create_train_op(total_loss, optimizer)
      checkpoint_path = '/path/to/old_model_checkpoint'
      # Specify the variables to restore via a list of inclusion or exclusion
      # patterns:
      variables_to_restore = slim.get_variables_to_restore(
          include=["conv"], exclude=["fc8", "fc9])
      # or
      variables_to_restore = slim.get_variables_to_restore(exclude=["conv"])
      init_assign_op, init_feed_dict = slim.assign_from_checkpoint(
          checkpoint_path, variables_to_restore)
      # Create an initial assignment function.
      def InitAssignFn(sess):
          sess.run(init_assign_op, init_feed_dict)
      # Run training.
      slim.learning.train(train_op, my_log_dir, init_fn=InitAssignFn)
    

    Update

    I tried the following:

    import tensorflow as tf
    import tensorflow.contrib.slim as slim
    import tensorflow.contrib.slim.nets as nets
    images = tf.placeholder(tf.float32, [None, 224, 224, 3])
    predictions = nets.vgg.vgg_16(images)
    print [v.name for v in slim.get_variables_to_restore(exclude=['fc8']) ]
    

    And got this output (shortened):

    [u'vgg_16/conv1/conv1_1/weights:0',
     u'vgg_16/conv1/conv1_1/biases:0',
     …
     u'vgg_16/fc6/weights:0',
     u'vgg_16/fc6/biases:0',
     u'vgg_16/fc7/weights:0',
     u'vgg_16/fc7/biases:0',
     u'vgg_16/fc8/weights:0',
     u'vgg_16/fc8/biases:0']
    

    So it looks like you should prefix scope with vgg_16:

    print [v.name for v in slim.get_variables_to_restore(exclude=['vgg_16/fc8']) ]
    

    gives (shortened):

    [u'vgg_16/conv1/conv1_1/weights:0',
     u'vgg_16/conv1/conv1_1/biases:0',
     …
     u'vgg_16/fc6/weights:0',
     u'vgg_16/fc6/biases:0',
     u'vgg_16/fc7/weights:0',
     u'vgg_16/fc7/biases:0']
    

    Update 2

    Complete example that executes without errors (at my system).

    import tensorflow as tf
    import tensorflow.contrib.slim as slim
    import tensorflow.contrib.slim.nets as nets
    
    s = tf.Session(config=tf.ConfigProto(gpu_options={'allow_growth':True}))
    
    images = tf.placeholder(tf.float32, [None, 224, 224, 3])
    predictions = nets.vgg.vgg_16(images, 200)
    variables_to_restore = slim.get_variables_to_restore(exclude=['vgg_16/fc8'])
    init_assign_op, init_feed_dict = slim.assign_from_checkpoint('./vgg16.ckpt', variables_to_restore)
    s.run(init_assign_op, init_feed_dict)
    

    In the example above vgg16.ckpt is a checkpoint saved by tf.train.Saver for 1000 classes VGG16 model.

    Using this checkpoint with all variables of 200 classes model (including fc8) gives the following error:

    init_assign_op, init_feed_dict = slim.assign_from_checkpoint('./vgg16.ckpt', slim.get_variables_to_restore())
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
          1 init_assign_op, init_feed_dict = slim.assign_from_checkpoint(
    ----> 2       './vgg16.ckpt', slim.get_variables_to_restore())
    
    /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/variables.pyc in assign_from_checkpoint(model_path, var_list)
        527     assign_ops.append(var.assign(placeholder_value))
        528
    --> 529     feed_dict[placeholder_value] = var_value.reshape(var.get_shape())
        530
        531   assign_op = control_flow_ops.group(*assign_ops)
    
    ValueError: total size of new array must be unchanged