In the tensorflow multi-gpu CIFAR 10 example, for each GPU they compute the loss (lines 174-180)
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
loss = tower_loss(scope)
When a few lines below (line 246), they evaluate loss
with
_, loss_value = sess.run([train_op, loss])
what loss is exactly computed?
I looked at the tower_loss
function, but I don't see any incremental aggregation over all GPUs (towers).
I understand that the whole graph is being executed (over all GPUs), but what value of the loss will be returned? Only the loss
on the last GPU? I don't see any aggregation on the actual loss
variable.
The computed loss
is indeed only the loss on the last GPU. In the code they use a Python variable loss
to access the Tensor.
You can also validate this easily by printing the Python variable representing this tensor. E.g. adding print(loss)
on line 244 (with a 2-GPU setup), will return:
Tensor("tower_1/total_loss_1:0", shape=(), dtype=float32, device=/device:GPU:1)