Search code examples
pythontensorflowtensorboard

Why tensors not connected to gradients in TensorBoard?


For practice, I wanted to implement a model in tensorflow which gives me back the square of the input. My code works correctly, but when I have a look at the computation graph in TensorBoard, the LOSS operation is not connected to the Gradients subgraph and neither to Adam. Why is this? As I understand, the compute the gradients, tensorflow has to derivate the loss.

Here is my code:

import numpy as np
import tensorflow as tf

np_inp = np.array([3, 6, 4, 2, 9, 11, 0.48, 22, -2.3, -0.48])
np_outp = np.power(np_inp, 2)

inputs = tf.Variable(np_inp, name='input', trainable=False)
outputs = tf.Variable(np_outp, name='output', trainable=False)

multiplier = tf.Variable(0.1,                          
                             dtype=tf.float64, trainable=True, name='multiplier')

mul = inputs * multiplier
predict = tf.square(mul, name='prediction')

loss = tf.math.reduce_sum(tf.math.square(predict-outputs), name='LOSS')
optimizer = tf.train.AdamOptimizer(0.1)
to_minimize = optimizer.minimize(loss)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

logs_path = "./logs/unt"  # path to the folder that we want to save the logs for Tensorboard
train_writer = tf.summary.FileWriter(logs_path, sess.graph)

for i in range(100):
  sess.run(to_minimize)
print(sess.run({'mult':multiplier}))

Tensorboard: https://gofile.io/?c=jxbWiG

Thanks in advance!


Solution

  • This can be counter intuitive, but the actual value of the loss is not used for the training itself (although it can be useful to plot it to see its progress). What optimizers generally use is the gradient, that is, how each change in each variable would affect the loss value. To compute this, a tensor with the same shape as LOSS but filled with ones is created, and the gradient of each operation is computed through back-propagation. If you open the gradients box in the graph, you will see a LOSS_grad box representing this.

    LOSS_grad

    It is a couple of nodes making that tensor of ones, because the gradient of something with respect to itself is always one. From there, the rest of gradients are computed.