Multi-GPU CIFAR10 example in tensorflow: aggregated loss

In the tensorflow multi-gpu CIFAR 10 example, for each GPU they compute the loss (lines 174-180)

for i in xrange(FLAGS.num_gpus):
  with tf.device('/gpu:%d' % i):
    with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
      loss = tower_loss(scope)

When a few lines below (line 246), they evaluate loss with

_, loss_value = sess.run([train_op, loss])

what loss is exactly computed?

I looked at the tower_loss function, but I don't see any incremental aggregation over all GPUs (towers).

I understand that the whole graph is being executed (over all GPUs), but what value of the loss will be returned? Only the loss on the last GPU? I don't see any aggregation on the actual loss variable.

Solution

The computed loss is indeed only the loss on the last GPU. In the code they use a Python variable loss to access the Tensor.

You can also validate this easily by printing the Python variable representing this tensor. E.g. adding print(loss)on line 244 (with a 2-GPU setup), will return:

Tensor("tower_1/total_loss_1:0", shape=(), dtype=float32, device=/device:GPU:1)