For practice, I wanted to implement a model in tensorflow which gives me back the square of the input. My code works correctly, but when I have a look at the computation graph in TensorBoard, the LOSS operation is not connected to the Gradients subgraph and neither to Adam. Why is this? As I understand, the compute the gradients, tensorflow has to derivate the loss.
Here is my code:
import numpy as np
import tensorflow as tf
np_inp = np.array([3, 6, 4, 2, 9, 11, 0.48, 22, -2.3, -0.48])
np_outp = np.power(np_inp, 2)
inputs = tf.Variable(np_inp, name='input', trainable=False)
outputs = tf.Variable(np_outp, name='output', trainable=False)
multiplier = tf.Variable(0.1,
dtype=tf.float64, trainable=True, name='multiplier')
mul = inputs * multiplier
predict = tf.square(mul, name='prediction')
loss = tf.math.reduce_sum(tf.math.square(predict-outputs), name='LOSS')
optimizer = tf.train.AdamOptimizer(0.1)
to_minimize = optimizer.minimize(loss)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
logs_path = "./logs/unt" # path to the folder that we want to save the logs for Tensorboard
train_writer = tf.summary.FileWriter(logs_path, sess.graph)
for i in range(100):
sess.run(to_minimize)
print(sess.run({'mult':multiplier}))
Tensorboard: https://gofile.io/?c=jxbWiG
Thanks in advance!
This can be counter intuitive, but the actual value of the loss is not used for the training itself (although it can be useful to plot it to see its progress). What optimizers generally use is the gradient, that is, how each change in each variable would affect the loss value. To compute this, a tensor with the same shape as LOSS
but filled with ones is created, and the gradient of each operation is computed through back-propagation. If you open the gradients
box in the graph, you will see a LOSS_grad
box representing this.
It is a couple of nodes making that tensor of ones, because the gradient of something with respect to itself is always one. From there, the rest of gradients are computed.