Search code examples
tensorflowmnist

Tensor Flow MNIST Evaluating predictions


I am working on this tutorial and I found in this following code: when evaluating predictions he runs accuracy which runs the correct variable which in turn runs prediction which will reintialize again the weights with randoms and reconstruct the NN model. How is this right? What am I missing ?

def neural_network_model(data):
    hidden_1_layer = {'weights':tf.Variable(tf.random_normal([784, n_nodes_hl1])),
                      'biases':tf.Variable(tf.random_normal([n_nodes_hl1]))}

    hidden_2_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])),
                      'biases':tf.Variable(tf.random_normal([n_nodes_hl2]))}

    hidden_3_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl2, n_nodes_hl3])),
                      'biases':tf.Variable(tf.random_normal([n_nodes_hl3]))}

    output_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl3, n_classes])),
                    'biases':tf.Variable(tf.random_normal([n_classes])),}


    l1 = tf.add(tf.matmul(data,hidden_1_layer['weights']), hidden_1_layer['biases'])
    l1 = tf.nn.relu(l1)

    l2 = tf.add(tf.matmul(l1,hidden_2_layer['weights']), hidden_2_layer['biases'])
    l2 = tf.nn.relu(l2)

    l3 = tf.add(tf.matmul(l2,hidden_3_layer['weights']), hidden_3_layer['biases'])
    l3 = tf.nn.relu(l3)

    output = tf.matmul(l3,output_layer['weights']) + output_layer['biases']

    return output

def train_neural_network(x):
    prediction = neural_network_model(x)
    cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(prediction,y) )
    optimizer = tf.train.AdamOptimizer().minimize(cost)

    hm_epochs = 10
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        for epoch in range(hm_epochs):
            epoch_loss = 0
            for _ in range(int(mnist.train.num_examples/batch_size)):
                epoch_x, epoch_y = mnist.train.next_batch(batch_size)
                _, c = sess.run([optimizer, cost], feed_dict={x: epoch_x, y: epoch_y})
                epoch_loss += c
            print('Epoch', epoch, 'completed out of',hm_epochs,'loss:',epoch_loss)

        correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))

        accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
        print('Accuracy:',accuracy.eval({x:mnist.test.images, y:mnist.test.labels}))

train_neural_network(x)

Solution

  • You almost got it right. The accuracy tensor indirectly depends on the prediction tensor, which is depending on a Tensor x. In your code snippet you did not include what x actually is; however from the linked tutorial:

    x = tf.placeholder('float', [None, 784])
    y = tf.placeholder('float')
    

    So x is a placeholder, i.e. a Tensor that obtains its value directly from the user. It is not entirely clear from the last line of

    train_neural_network(x)
    

    that he is not actually calling a transformation function train_neural_network(x) that takes an x and processes it on the fly, like you would expect from a regular function; rather, the function uses a reference to the previously defined placeholder variables - dummies, really - in order to define a computation graph it then directly executes using a session. The graph, however, is only constructed once using neural_network_model(x) and then queried for a given number of epochs.

    What you missed is this:

    _, c = sess.run([optimizer, cost], feed_dict={x: epoch_x, y: epoch_y})
    

    This queries the result of the optimizer operation and the cost tensor given that the input values are epoch_x for x and epoch_y for y, pulling data through all defined computation nodes, all the way back "down" to x. In order to obtain the cost, y is needed as well. Both are provided by the caller. The AdamOptimizer will update all trainable variables as part of its execution, changing the network's weights.

    After that,

    accuracy.eval({x: mnist.test.images, y: mnist.test.labels})
    

    or, equivalently

    sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels})
    

    then issues another evaluation of the same graph - without changing it - but this time using the inputs mnist.test.images for x and mnist.test.labels for y. It works because prediction itself depends on x, which is overridden to the user-provided values on each call to sess.run(...).

    Here is what the graph looks like in TensorBoard. It's hard to tell, but the two placeholder nodes are on the bottom left, next to the orange node called "Variable" and in the center right, below the green "Slice_1".

    Here's how the relevant part of the network's graph looks like; I exported this using TensorBoard. It's a bit hard to get since the nodes are not manually labeled (except for a couple I labeled myself), but there are six relevant points here. Placeholders are yellow: On the bottom right you'll find x and y is on the center left. Green are the intermediate values that make sense to us: On the left is the prediction tensor, on the right there's the tensor called correct. The blue parts are endpoints of the graph: On the top left there's the cost tensor and on the top right you'll find accuracy. In essence, data flows from the bottom to the top.

    Network graph

    So, whenever you say "evaluate prediction given x", "evaluate accuracy given x and y" or "optimize my network given x and y", you really just provide values on the yellow ends and observe the outcome on the green or blue ones.