optimization tensorflow pre-trained-model

Tensorflow load pre-trained model use different optimizer

I want to load a pre-trained model (optimized by AdadeltaOptimizer) and continue training with SGD (GradientDescentOptimizer). The models are saved and loaded with tensorlayer API:

save model:

import tensorlayer as tl
tl.files.save_npz(network.all_params,
                  name=model_dir + "model-%d.npz" % global_step)

load model:

load_params = tl.files.load_npz(path=resume_dir + '/', name=model_name)
tl.files.assign_params(sess, load_params, network)

If I continue training with adadelta, the training loss (cross entropy) looks normal (start at a close value as the loaded model). However, if I change the optimizer to SGD, the training loss would be as large as a newly initialized model.

I took a look at the model-xxx.npz file from tl.files.save_npz. It only saves all model parameters as ndarray. I'm not sure how the optimizer or learning rate is involved here.

Solution

You probably would have to import the tensor into a variable which is the loss function/cross-entropy that feeds into your Adam Optimizer previously. Now, just feed it through your SGD optimizer instead.

saver = tf.train.import_meta_graph('filename.meta')
saver.restore(sess,tf.train.latest_checkpoint('./'))
graph = tf.get_default_graph()
cross_entropy = graph.get_tensor_by_name("entropy:0") #Tensor to import

optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)

In this case, I have tagged the cross-entropy Tensor before training my pre-train model with the name entropy, as such

tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv), name = 'entropy')

If you are unable to make changes to your pretrain model, you can obtain the list of Tensors in your model(after you have imported it) from graph and deduce which Tensor you require. I have no experience with Tensorlayer, so this guide is to provide more of an understanding. You can take a look at Tensorlayer-Layers, they should explain how to obtain your Tensor. As Tensorlayer is built on top of Tensorflow, most of the functions should still be available.