I am working on this tutorial and I found in this following code: when evaluating predictions he runs accuracy which runs the correct variable which in turn runs prediction which will reintialize again the weights with randoms and reconstruct the NN model. How is this right? What am I missing ?
def neural_network_model(data):
hidden_1_layer = {'weights':tf.Variable(tf.random_normal([784, n_nodes_hl1])),
'biases':tf.Variable(tf.random_normal([n_nodes_hl1]))}
hidden_2_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])),
'biases':tf.Variable(tf.random_normal([n_nodes_hl2]))}
hidden_3_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl2, n_nodes_hl3])),
'biases':tf.Variable(tf.random_normal([n_nodes_hl3]))}
output_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl3, n_classes])),
'biases':tf.Variable(tf.random_normal([n_classes])),}
l1 = tf.add(tf.matmul(data,hidden_1_layer['weights']), hidden_1_layer['biases'])
l1 = tf.nn.relu(l1)
l2 = tf.add(tf.matmul(l1,hidden_2_layer['weights']), hidden_2_layer['biases'])
l2 = tf.nn.relu(l2)
l3 = tf.add(tf.matmul(l2,hidden_3_layer['weights']), hidden_3_layer['biases'])
l3 = tf.nn.relu(l3)
output = tf.matmul(l3,output_layer['weights']) + output_layer['biases']
return output
def train_neural_network(x):
prediction = neural_network_model(x)
cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(prediction,y) )
optimizer = tf.train.AdamOptimizer().minimize(cost)
hm_epochs = 10
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(hm_epochs):
epoch_loss = 0
for _ in range(int(mnist.train.num_examples/batch_size)):
epoch_x, epoch_y = mnist.train.next_batch(batch_size)
_, c = sess.run([optimizer, cost], feed_dict={x: epoch_x, y: epoch_y})
epoch_loss += c
print('Epoch', epoch, 'completed out of',hm_epochs,'loss:',epoch_loss)
correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
print('Accuracy:',accuracy.eval({x:mnist.test.images, y:mnist.test.labels}))
train_neural_network(x)
You almost got it right. The accuracy
tensor indirectly depends on the prediction
tensor, which is depending on a Tensor x
. In your code snippet you did not include what x
actually is; however from the linked tutorial:
x = tf.placeholder('float', [None, 784])
y = tf.placeholder('float')
So x
is a placeholder, i.e. a Tensor that obtains its value directly from the user. It is not entirely clear from the last line of
train_neural_network(x)
that he is not actually calling a transformation function train_neural_network(x)
that takes an x
and processes it on the fly, like you would expect from a regular function; rather, the function uses a reference to the previously defined placeholder variables - dummies, really - in order to define a computation graph it then directly executes using a session.
The graph, however, is only constructed once using neural_network_model(x)
and then queried for a given number of epochs.
What you missed is this:
_, c = sess.run([optimizer, cost], feed_dict={x: epoch_x, y: epoch_y})
This queries the result of the optimizer
operation and the cost
tensor given that the input values are epoch_x
for x
and epoch_y
for y
, pulling data through all defined computation nodes, all the way back "down" to x
. In order to obtain the cost
, y
is needed as well. Both are provided by the caller. The AdamOptimizer
will update all trainable variables as part of its execution, changing the network's weights.
After that,
accuracy.eval({x: mnist.test.images, y: mnist.test.labels})
or, equivalently
sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels})
then issues another evaluation of the same graph - without changing it - but this time using the inputs mnist.test.images
for x
and mnist.test.labels
for y
.
It works because prediction
itself depends on x
, which is overridden to the user-provided values on each call to sess.run(...)
.
Here is what the graph looks like in TensorBoard. It's hard to tell, but the two placeholder nodes are on the bottom left, next to the orange node called "Variable" and in the center right, below the green "Slice_1".
Here's how the relevant part of the network's graph looks like; I exported this using TensorBoard. It's a bit hard to get since the nodes are not manually labeled (except for a couple I labeled myself), but there are six relevant points here. Placeholders are yellow: On the bottom right you'll find x
and y
is on the center left.
Green are the intermediate values that make sense to us: On the left is the prediction
tensor, on the right there's the tensor called correct
. The blue parts are endpoints of the graph: On the top left there's the cost
tensor and on the top right you'll find accuracy
. In essence, data flows from the bottom to the top.
So, whenever you say "evaluate prediction
given x
", "evaluate accuracy
given x
and y
" or "optimize my network given x
and y
", you really just provide values on the yellow ends and observe the outcome on the green or blue ones.