tensorflow - saving and restoring operations

I have simple goal of training model in tensorflow saving and restoring it later either in order to continue training or to use some functions/operations.

Here is simple example of the model

import tensorflow as tf
import numpy as np

BATCH_SIZE = 3
VECTOR_SIZE = 1
LEARNING_RATE = 0.1

x = tf.placeholder(tf.float32, [BATCH_SIZE, VECTOR_SIZE],
                   name='input_placeholder')
y = tf.placeholder(tf.float32, [BATCH_SIZE, VECTOR_SIZE],
                   name='labels_placeholder')

W = tf.get_variable('W', [VECTOR_SIZE, BATCH_SIZE])
b = tf.get_variable('b', [VECTOR_SIZE], initializer=tf.constant_initializer(0.0))

y_hat = tf.matmul(W, x) + b
predict = tf.matmul(W, x) + b
total_loss = tf.reduce_mean(y-y_hat)
train_step = tf.train.AdagradOptimizer(LEARNING_RATE).minimize(total_loss)
X = np.ones([BATCH_SIZE, VECTOR_SIZE])
Y = np.ones([BATCH_SIZE, VECTOR_SIZE])
all_saver = tf.train.Saver() 

sess= tf.Session()
sess.run(tf.global_variables_initializer())
sess.run([train_step], feed_dict = {x: X, y:Y}))
save_path =  r'C:\some_path\save\\'
all_saver.save(sess,save_path)

Now we restore it here:

meta_path = r'C:\some_path\save\.meta'
new_all_saver = tf.train.import_meta_graph(meta_path)

graph = tf.get_default_graph()
all_ops = graph.get_operations()
for el in all_ops:
    print(el)

In restored operation one could not even find predict or train_step from original code. Do I need to name this operations before saving? How can I get predict back and run something like this

sess=tf.Session()
sess.run([predict], feed_dict = {x:X})

P.S. I read many tutorials on saving and restoring in tensorflow, but still have poor understanding how it all works.

Solution

1) Your operations are there in the restored model, but since you haven't named them, they will be named according to some default rules. For example, since you have:

predict = tf.matmul(W, x) + b

then the operation representing predict might look like:

name: "add"
op: "Add"
input: "MatMul"
input: "b/read"
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}

In this example, which is printed when you do for el in all_ops: and prints the result, you see that the name of the operation is "add", which was automatically assigned; the operation type ("op") is "Add", which corresponds to the last operation performed in the line of code (which is +); and the inputs are "MatMul" and "b/read", corresponding to what you summed. Just to be clear, I am not sure that just this operation corresponds to the given line of code, since other add's with the same kind of inputs were present in the print, but this is a possible one.

So to sum up until now: you operations are there and you are seeing them when printing. But why don't you see the word "predict"? Well, because this is not the name of the tensor or operation in the Tensorflow Graph, it is only the name of a variable in your code.

Going forward, how could you then access this "predict"? The answer is through its name as it is stated in the graph. In the case above, the name of predict could be "add", if I am right about my guess, but let's name your "predict" instead, so you can easily have control over which operation corresponds to it.

In order to name your "predict", let's add the following line of code just below predict = tf.matmul(W, x) + b:

predict_named = tf.identity(predict, "here_i_put_a_name")

What this line is doing is creating a new operation, which receives as input the operation defined in "predict", and produces an output which is equal to the result of the input. The operation itself is not doing much - just repeating a value - but through this operation I could add a name to it. Now, if you search in your print, you will be able to find:

name: "add_1"
op: "Add"
input: "MatMul_1"
input: "b/read"
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}

name: "here_i_put_a_name"
op: "Identity"
input: "add_1"
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}

Nice! Now you 1) can access your "predict" using the name "here_i_put_a_name" and 2) we could just confirm that your "predict" was in fact the operation with name "add_1" - just check above the "input" attribute of the operation "here_i_put_a_name".

With that done, let's access the operation "here_i_put_a_name" and get some predictions done. First, change your save_path and meta_path, putting a possible file name in the end, for example:

save_path =  r'C:\some_path\save\my_model_name'
meta_path = r'C:\some_path\save\my_model_name.meta'

Then, in the end of your restoring code, add:

with tf.Session(graph=graph) as sess:

    new_all_saver.restore(sess,save_path)
    my_prediction = sess.run(["here_i_put_a_name:0"], feed_dict={"input_placeholder:0": [[1],[2],[3]]})
    print(my_prediction)

With this block, you are creating a new Tensorflow session and using the graph stored in your variable "graph". Inside of this context, you are restoring the session from save_path to the current session. Then, you are running a prediction, or more exactly, you are running the operation "here_i_put_a_name" and getting the first output of this operation (reason why we have ":0" after it). The feed dict is giving the value [[1],[2],[3]] to the tensor "input_placeholder:0" (again, the ":0" is showing us that this is a tensor, not an operation).

With all that said and with the questions (hopefully) answered, I have some comments:

1)In my experience, it can be nice to use the library tf.saved_model in order to save and restore modules. But this is my personal suggestion.

2)I limited myself to answer your questions about naming and invoking operations, so I have ignored the training and predicting routines. But I think you are dealing wrong with the problem when putting your variable X with BATCH_SIZE as size.

3)Notice the difference between "blabla" and "blabla:0". The first one is an operation and the last one is a tensor.